reference

Codex reference: strong workflow closure

Codex is not a local LM Studio run. It is kept as a reference line for what stronger agentic tooling does on the same public cases.

codex-default benchmark note infographic
83.3% Practical score
7/9 Resolved
8/9 Core pass
City Plan SVG passed Visual sample

What Worked

  • Best current practical score across the full public case set.
  • Strong at preserving protected input folders while producing required artifacts.
  • Most failures were narrow near misses rather than broad document misunderstanding.

Where It Broke

  • Still failed one case strictly and had proof/evidence misses.
  • Not directly comparable to local-only LM Studio runs.
  • Useful as a ceiling/reference, not as the point of the site.

Readout

The reference run shows what clean workflow closure looks like; local models are measured against the same artifacts, not against marketing claims.