api cheap May 17, 2026

Gemini 3.1 Flash Lite: cheap, quick, still not closed

Gemini 3.1 Flash Lite ran the full current suite quickly and cheaply. It reached five core passes, generated a valid City Plan SVG, and still resolved none of the nine practical paperwork cases strictly.

27.8% Practical score

0/9 Resolved

5/9 Core pass

City Plan SVG passed Visual sample

Gemini 3.1 Flash Lite is included as a cheap API comparison, not as a local model. It is useful because it shows what a very accessible current API model does on the same paperwork contracts as the Mac mini runs.

The short version is familiar: it understood enough to produce several near misses, but the exact job did not close. Proof codes, evidence paths, attachment indexes, normalized text, and payment reconciliation still broke the strict oracle.

Model Context

Model family: Google Gemini
Run type: OpenRouter API run
Leaderboard group: api cheap
Local hardware: Not a Mac mini LM Studio run
Benchmark role: Very cheap API comparison

Positioned As

OpenRouter lists Gemini 3.1 Flash Lite as a text, image, video, file, and audio input model with text output.
In this benchmark it is not treated as a flagship reference. It is a cheap baseline for practical document and workflow closure.
That distinction matters: a low-cost model can be useful for triage while still being unsafe as an unattended paperwork finisher.

What We Actually Tested

The model ran all five generated-invoice Paperwork Trial cases.
It also ran all four agentic Paperwork Workflow cases, where it had to create intermediate artifacts and preserve protected input folders.
Finally, it ran the separate City Plan SVG sample, which is not included in the overall practical score.

What Worked

Fast responses across the suite.
Reached core-pass level on five of nine practical cases.
Produced visible workflow artifacts in the agentic folder tasks.
Generated a valid standalone City Plan SVG.

Where It Broke

Zero strict resolved practical cases.
Repeated proof-code and proof.txt failures.
Evidence and attachment details did not match the hidden oracle.
One workflow case missed core due normalized text, proof, and warning-code issues.

Readout

Gemini 3.1 Flash Lite is useful as a cheap triage/reference row, but the result is a good reminder of the benchmark's main point: near enough is not finished. The model can often find the shape of the answer. It still does not reliably close the paperwork workflow.

Sources

OpenRouter model listing Google Gemini model docs

Run Outputs

Paperwork run Workflow W07 SVG sample all notes