P05 The Paperwork Trial Generated invoice images

PO Revision

A generated scan case where split payments and the latest purchase-order revision decide the outcome.

Transparency boundary

This page shows the files the model can inspect. It does not show the hidden oracle solution.

Excluded: ground_truth.json, expected_artifacts.json, manual readings, and calibration notes.

What This Tests

split paymentcancelled PO revisionquote distractor

Files Shown

1 generated image files and 5 text/CSV/task files.

All files are synthetic benchmark material.

Generated Images

Scans and visual attachments visible to the model
P05 input scans/delta_po_revision_contact_sheet.png
scans/delta_po_revision_contact_sheet.png

Task And Source Files

No oracle solution files
README_TASK.md
# Paperwork Trial v3 Generated Image Case 05

This is synthetic benchmark bookkeeping. It is not financial, tax, or legal advice.

Audit the scanned document folder and write `audit_result.json` only.

Case ID: `P3-GEN-05`

Output keys exactly:

`case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids, ignored_document_ids, total_approved_gross_cents, warnings_by_invoice, evidence, proof_code`

Rules:

- Treat the files in `scans/` as the source documents.
- Ignore documents that are explicitly quotes or not invoice payment requests.
- Match vendors by visible vendor name and tax ID when available.
- `payment_match` requires paid bank rows for the invoice whose summed amount equals the invoice gross.
- Split bank payments count as `payment_match` when paid rows with the same invoice ID sum to the invoice gross.
- Use the latest purchase order revision visible in the scanned documents and reflected in `purchase_orders.csv`.
- `po_cancelled` applies when the invoice uses a purchase order that the latest revision marks cancelled.
- Approved invoices have no warnings and paid amount equals gross.
- Review invoices have warnings but are not reject-level. `po_cancelled` is review-level.
- Reject only if the invoice is impossible to process, uses an inactive vendor, or lacks a required PO.
- Sort all invoice-id arrays ascending.
- Allowed warning codes are exactly: `po_cancelled`.
- `warnings_by_invoice` must include every real invoice ID and sorted warning arrays.
- Warning arrays must be flat arrays of lowercase strings, never nested arrays and never prose labels.
- `ignored_document_ids` must include visible document IDs from ignored non-invoice documents, not filenames.
- `total_approved_gross_cents` is the sum of approved invoice gross totals only.
- `evidence` must list the relative source file paths used in stable alphabetical order, including folder prefixes such as `scans/`.
- Include the relevant CSV files and every scanned document inspected in `evidence`, including ignored quote scans.
- `proof_code = total_approved_gross_cents + sum(numeric parts of all real invoice IDs) + 97 * total_warning_count`.

bank_export.csv
date,description,invoice_id,amount_cents,status
2026-07-02,Delta Repair Group partial payment,INV-5600,20000,paid
2026-07-03,Delta Repair Group final payment,INV-5600,10000,paid
2026-07-04,Delta Repair Group parts invoice,INV-5601,18000,paid

model_prompt.md
You are auditing a synthetic scanned paperwork folder.

Read `README_TASK.md`, inspect the files in `scans/`, and use `bank_export.csv`, `vendor_master.csv`, and `purchase_orders.csv`.

Write `audit_result.json` only.

This is benchmark bookkeeping, not financial, tax, or legal advice.

Important:
- The scanned images are the source documents.
- Ignore quotes and documents that are not invoice payment requests.
- Split payments can sum to one invoice.
- Use the latest purchase order revision visible in the scanned documents and in `purchase_orders.csv`.
- Do not invent fields that are not supported by the files.
- Use visible document IDs for `ignored_document_ids`, not filenames.
- Use relative paths with folder prefixes in `evidence`, for example `scans/example.png`.
- Use only allowed lowercase warning codes from `README_TASK.md`.
- Warning arrays must be flat arrays of strings.

purchase_orders.csv
po_id,vendor_id,limit_cents,status
PO-5600-A,V-640,35000,open
PO-5600-B,V-640,22000,cancelled

vendor_master.csv
vendor_id,name,tax_id,status
V-640,Delta Repair Group,DR-640,active
V-641,Delta Repair North,DR-641,active