Gemini 3.1 Flash Lite: cheap, quick, still not closed
Gemini 3.1 Flash Lite ran the full current suite quickly and cheaply. It reached five core passes, generated a valid City Plan SVG, and still resolved none of the nine practical paperwork cases strictly.
Gemini 3.1 Flash Lite is included as a cheap API comparison, not as a local model. It is useful because it shows what a very accessible current API model does on the same paperwork contracts as the Mac mini runs.
The short version is familiar: it understood enough to produce several near misses, but the exact job did not close. Proof codes, evidence paths, attachment indexes, normalized text, and payment reconciliation still broke the strict oracle.
Model Context
- Model family
- Google Gemini
- Run type
- OpenRouter API run
- Leaderboard group
- api cheap
- Local hardware
- Not a Mac mini LM Studio run
- Benchmark role
- Very cheap API comparison
Positioned As
- OpenRouter lists Gemini 3.1 Flash Lite as a text, image, video, file, and audio input model with text output.
- In this benchmark it is not treated as a flagship reference. It is a cheap baseline for practical document and workflow closure.
- That distinction matters: a low-cost model can be useful for triage while still being unsafe as an unattended paperwork finisher.
What We Actually Tested
- The model ran all five generated-invoice Paperwork Trial cases.
- It also ran all four agentic Paperwork Workflow cases, where it had to create intermediate artifacts and preserve protected input folders.
- Finally, it ran the separate City Plan SVG sample, which is not included in the overall practical score.
What Worked
- Fast responses across the suite.
- Reached core-pass level on five of nine practical cases.
- Produced visible workflow artifacts in the agentic folder tasks.
- Generated a valid standalone City Plan SVG.
Where It Broke
- Zero strict resolved practical cases.
- Repeated proof-code and proof.txt failures.
- Evidence and attachment details did not match the hidden oracle.
- One workflow case missed core due normalized text, proof, and warning-code issues.
Readout
Gemini 3.1 Flash Lite is useful as a cheap triage/reference row, but the result is a good reminder of the benchmark's main point: near enough is not finished. The model can often find the shape of the answer. It still does not reliably close the paperwork workflow.