codex-default
2026-05-14T16-41-40Z_codex-default_paperwork_workflow_case07
Reference run: Codex CLI, OpenCode, or OpenRouter baseline, included as a comparison target rather than a local-model result.
100% Practical score
1/1 Resolved tasks
1/1 Core tasks
Verdict
practical run result resolved
Core 1/1 · Visible 1/1 · Hidden 1/1
no failure labels
Prompt set v3_paperwork_workflow_case07.json
Run type reference
Temperature 0
Top P 1
Missed tasks 0/1
Task Results
Compact view; open a row for prompt and raw answer| Task | Category | Score | Failed Checks | Failure Types | Prompt / Answer |
|---|---|---|---|---|---|
| credit_offset_case_07 | paperwork | 4/4 | none | none | case files OpenPrompt Audit the synthetic credit offset intake folder in `incoming/`. Write these files: - `work/attachment_index.json` - `work/final_document_set.json` - `work/payment_reconciliation.json` - `work/normalized/INV-4409.txt` - `work/normalized/INV-4411.txt` - `audit_result.json` - `proof.txt` Use `incoming/README_TASK.md` as the full task specification. Use the generated image attachments, AP note, bank export, vendor master, and purchase orders. Important: - Use credit memo CM-4410 as payment-offset evidence, not as an invoice. - Ignore the duplicate scan of INV-4409. - Reject INV-4411 because its vendor is inactive and its PO is cancelled. - Do not modify anything under `incoming/`. - This is benchmark bookkeeping, not financial, tax, or legal advice. - `proof.txt` must contain only the numeric proof code. Workspace files available to the model README_TASK.md# Paperwork Workflow Case 07: Credit Offset Packet
Audit this synthetic intake folder. This is benchmark bookkeeping, not financial or tax advice.
You must not modify anything under incoming/.
Case ID for audit_result.json: P3-WORK-07
Steps:
1. Identify active source files and ignore duplicate scans, statements, and credit memos as non-invoice documents.
2. Use the credit memo to explain why the bank payment is lower than the approved invoice gross total.
3. Reject invoices from inactive vendors or cancelled purchase orders.
4. Create work/attachment_index.json.
5. Create work/final_document_set.json.
6. Create work/payment_reconciliation.json.
7. Create normalized invoice text files under work/normalized/.
8. Create audit_result.json.
9. Create proof.txt containing only the final proof_code.
Required audit_result.json keys:
case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids,
ignored_document_ids, total_approved_gross_cents, warnings_by_invoice,
evidence, proof_code
Required audit_result.json values and formats:
- case_id must be "P3-WORK-07"
- ignored_document_ids must be ["CM-4410", "INV-4409-DUP", "STATEMENT-JUN"]
- warnings_by_invoice must use only these short warning codes: inactive_vendor, cancelled_po
- evidence must be an array of source path strings, not a nested object
Proof code formula:
total_approved_gross_cents
+ numeric parts of approved/review/reject invoice IDs
+ numeric part of the credit memo ID
+ 97 * total warning count
Use invoice IDs only for approved/review/reject invoice IDs. Do not include duplicate scans, credit memos, or statement IDs in the invoice-ID sum.
Required work/attachment_index.json shape:
{"attachments":[{"attachment_path":"...","document_id":"...","document_type":"...","use":"..."}]}
Required work/final_document_set.json shape:
{
"approved_invoice_ids": [...],
"credit_document_ids": [...],
"ignored_document_ids": [...],
"rejected_invoice_ids": [...],
"source_bank_export": "..."
}
Required work/payment_reconciliation.json shape:
{
"bank_payment_cents": 123,
"credit_offset_cents": 123,
"gross_invoice_cents": 123,
"matched_invoice_id": "...",
"payment_reference": "..."
}
ap_notes.txtAP note: - INV-4409 is the final invoice for Flare Tooling. - CM-4410 is a credit memo and reduces the payment, but it is not an invoice. - invoice_4409_duplicate_scan.png is a duplicate scan of INV-4409. - INV-4411 belongs to a vendor on hold and PO-7788 is cancelled. bank_export_june.csvdate,description,amount_cents 2026-06-03,PAY FLARE INV-4409 LESS CM-4410,28000 purchase_orders.csvpo_id,vendor_id,status,approved_invoice_id PO-7701,V-FLARE,approved,INV-4409 PO-7788,V-OLD,cancelled,INV-4411 vendor_master.csvvendor_id,vendor_name,status V-FLARE,Flare Tooling,active V-OLD,Old Gate Parts,inactive Input image Agent actions codex_exec Model answer ## audit_result.json
{
"case_id": "P3-WORK-07",
"approved_invoice_ids": [
"INV-4409"
],
"review_invoice_ids": [],
"reject_invoice_ids": [
"INV-4411"
],
"ignored_document_ids": [
"CM-4410",
"INV-4409-DUP",
"STATEMENT-JUN"
],
"total_approved_gross_cents": 32100,
"warnings_by_invoice": {
"INV-4411": [
"inactive_vendor",
"cancelled_po"
]
},
"evidence": [
"incoming/README_TASK.md",
"incoming/ap_notes.txt",
"incoming/bank_export_june.csv",
"incoming/vendor_master.csv",
"incoming/purchase_orders.csv",
"incoming/attachments/invoice_4409.png",
"incoming/attachments/credit_memo_4410.png",
"incoming/attachments/invoice_4409_duplicate_scan.png",
"incoming/attachments/invoice_4411_cancelled_po.png",
"incoming/attachments/statement_page.png"
],
"proof_code": 45524
}
## proof.txt
45524
## work/attachment_index.json
{
"attachments": [
{
"attachment_path": "incoming/attachments/invoice_4409.png",
"document_id": "INV-4409",
"document_type": "invoice",
"use": "approved_source_invoice"
},
{
"attachment_path": "incoming/attachments/credit_memo_4410.png",
"document_id": "CM-4410",
"document_type": "credit_memo",
"use": "payment_offset_evidence"
},
{
"attachment_path": "incoming/attachments/invoice_4409_duplicate_scan.png",
"document_id": "INV-4409-DUP",
"document_type": "duplicate_scan",
"use": "ignored_duplicate"
},
{
"attachment_path": "incoming/attachments/invoice_4411_cancelled_po.png",
"document_id": "INV-4411",
"document_type": "invoice",
"use": "rejected_invoice"
},
{
"attachment_path": "incoming/attachments/statement_page.png",
"document_id": "STATEMENT-JUN",
"document_type": "vendor_statement",
"use": "ignored_supporting_context"
}
]
}
## work/final_document_set.json
{
"approved_invoice_ids": [
"INV-4409"
],
"credit_document_ids": [
"CM-4410"
],
"ignored_document_ids": [
"CM-4410",
"INV-4409-DUP",
"STATEMENT-JUN"
],
"rejected_invoice_ids": [
"INV-4411"
],
"source_bank_export": "incoming/bank_export_june.csv"
}
## work/normalized/INV-4409.txt
Invoice ID: INV-4409
Vendor ID: V-FLARE
Vendor name: Flare Tooling
PO: PO-7701
Vendor status: active
PO status: approved
Net cents: 26975
Tax cents: 5125
Gross total cents: 32100
Payment: bank paid 28000 cents after credit memo CM-4410
Disposition: approved
## work/normalized/INV-4411.txt
Invoice ID: INV-4411
Vendor ID: V-OLD
Vendor name: Old Gate Parts
PO: PO-7788
Vendor status: inactive
PO status: cancelled
Net cents: 8100
Tax cents: 1539
Gross total cents: 9639
Disposition: rejected
Warnings: inactive_vendor, cancelled_po
## work/payment_reconciliation.json
{
"bank_payment_cents": 28000,
"credit_offset_cents": 4100,
"gross_invoice_cents": 32100,
"matched_invoice_id": "INV-4409",
"payment_reference": "PAY FLARE INV-4409 LESS CM-4410"
}
## verify_output.json
{
"ok": true,
"core_ok": true,
"visible_ok": true,
"failure_types": [],
"checks": [
{
"check": "audit_result_exists",
"passed": true,
"message": "Expected final audit_result.json with exact keys."
},
{
"check": "visible_checks_pass",
"passed": true,
"message": "Expected required artifacts and unchanged incoming/ sources."
},
{
"check": "core_oracle_pass",
"passed": true,
"message": "Expected core audit fields to match the oracle."
},
{
"check": "hidden_oracle_pass",
"passed": true,
"message": "Expected exact audit, artifacts, proof.txt, and unchanged sources."
}
],
"audit_error": "",
"actual": {
"case_id": "P3-WORK-07",
"approved_invoice_ids": [
"INV-4409"
],
"review_invoice_ids": [],
"reject_invoice_ids": [
"INV-4411"
],
"ignored_document_ids": [
"CM-4410",
"INV-4409-DUP",
"STATEMENT-JUN"
],
"total_approved_gross_cents": 32100,
"warnings_by_invoice": {
"INV-4411": [
"inactive_vendor",
"cancelled_po"
]
},
"evidence": [
"incoming/README_TASK.md",
"incoming/ap_notes.txt",
"incoming/bank_export_june.csv",
"incoming/vendor_master.csv",
"incoming/purchase_orders.csv",
"incoming/attachments/invoice_4409.png",
"incoming/attachments/credit_memo_4410.png",
"incoming/attachments/invoice_4409_duplicate_scan.png",
"incoming/attachments/invoice_4411_cancelled_po.png",
"incoming/attachments/statement_page.png"
],
"proof_code": 45524
},
"expected": {
"approved_invoice_ids": [
"INV-4409"
],
"case_id": "P3-WORK-07",
"evidence": [
"incoming/attachments/invoice_4409.png",
"incoming/attachments/credit_memo_4410.png",
"incoming/attachments/invoice_4411_cancelled_po.png",
"incoming/bank_export_june.csv",
"incoming/purchase_orders.csv",
"incoming/vendor_master.csv"
],
"ignored_document_ids": [
"CM-4410",
"INV-4409-DUP",
"STATEMENT-JUN"
],
"proof_code": "45524",
"reject_invoice_ids": [
"INV-4411"
],
"review_invoice_ids": [],
"total_approved_gross_cents": 32100,
"warnings_by_invoice": {
"INV-4409": [],
"INV-4411": [
"inactive_vendor",
"cancelled_po"
]
}
}
}
|