P02The Paperwork TrialGenerated invoice images
Credit Note And Vendor Hold
A scanned folder with a credit note, partial payment, missing PO, and inactive-vendor warning.
Transparency boundary
This page shows the files the model can inspect. It does not show the hidden oracle solution.
Excluded: ground_truth.json, expected_artifacts.json, manual readings, and calibration notes.
Generated Images
Scans and visual attachments visible to the model



Task And Source Files
No oracle solution filesREADME_TASK.md
# Paperwork Trial v3 Calibration Case 02 This is synthetic benchmark bookkeeping. It is not financial, tax, or legal advice. Audit the scanned document folder and write `audit_result.json` only. Case ID: `P3-GEN-02` Output keys exactly: `case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids, ignored_document_ids, total_approved_gross_cents, warnings_by_invoice, evidence, proof_code` Rules: - Treat the files in `scans/` as the source documents. - Ignore documents that are explicitly credit notes or not invoice payment requests. - Match vendors by visible vendor name and tax ID when available. - `payment_match` requires paid bank rows for the invoice whose summed amount equals the invoice gross. - `payment_short` applies when the paid bank amount is lower than invoice gross. - `missing_po` applies when the scanned invoice visibly has no valid PO number or says `MISSING PO`. - `inactive_vendor` applies when the scanned invoice visibly has a vendor-hold/inactive-vendor stamp or vendor records mark the vendor inactive. - Approved invoices have no warnings and paid amount equals gross. - Review invoices have warnings but are not reject-level. `payment_short` is review-level. - Reject invoices with `missing_po` or `inactive_vendor`. - Sort all invoice-id arrays ascending. - Allowed warning codes are exactly: `inactive_vendor`, `missing_po`, `payment_short`. - `warnings_by_invoice` must include every real invoice ID and sorted warning arrays. - Warning arrays must be flat arrays of lowercase strings, never nested arrays and never prose labels. - `ignored_document_ids` must include visible document IDs from ignored non-invoice documents, not filenames. Example: use `CN-10032`, not `credit_note_10032_credit_applied.png`. - `total_approved_gross_cents` is the sum of approved invoice gross totals only. - `evidence` must list the relative source file paths used in stable alphabetical order, including folder prefixes such as `scans/`. - Include the relevant CSV files and every scanned document inspected in `evidence`, including ignored credit-note scans. - `proof_code = total_approved_gross_cents + sum(numeric parts of all real invoice IDs) + 97 * total_warning_count`. Important: `Northwind Office Supply` is the customer, not the vendor.
bank_export.csv
date,description,invoice_id,amount_cents,status 2026-04-23,BrightPath Office Solutions,INV-82415,18737,paid 2026-05-02,BrightPath Office Solutions,INV-82478,10000,paid 2026-05-05,BrightPath Office Solutions,INV-82533,23794,pending
document_extracts.csv
source_path,document_id,document_type,vendor_name,tax_id,po_id,net_cents,tax_cents,gross_total_cents,visible_marks,notes scans/credit_note_10032_credit_applied.png,CN-10032,credit_note,BrightPath Office Solutions,BP-9200,PO-4488,,,-3725,CREDIT APPLIED,Original invoice INV-82210; credit note is not an invoice payment request scans/inv_82415_paid.png,INV-82415,invoice,BrightPath Office Solutions,BP-9200,PO-4510,,,18737,paid stamp,Customer shown as Northwind Office Supply scans/inv_82478_partial_payment.png,INV-82478,invoice,BrightPath Office Solutions,BP-9200,PO-4577,,,14144,"received stamp; handwritten: Partial payment of $100.00 received 5/02. Balance due: $41.44",Customer shown as Northwind Office Supply scans/inv_82533_vendor_hold.png,INV-82533,invoice,BrightPath Office Solutions,BP-9200,MISSING PO,,,23794,"VENDOR HOLD; INACTIVE VENDOR; handwritten: Please provide PO or approval before processing",Customer shown as Northwind Office Supply
model_prompt.md
You are auditing a synthetic scanned paperwork folder. Read `README_TASK.md`, inspect the files in `scans/`, and use `bank_export.csv`, `vendor_master.csv`, and `purchase_orders.csv`. Write `audit_result.json` only. This is benchmark bookkeeping, not financial, tax, or legal advice. Important: - The scanned images are the source documents. - `Northwind Office Supply` is the customer, not the vendor. - Ignore credit notes and documents that are not invoice payment requests. - Do not invent fields that are not supported by the files. - Use visible document IDs for `ignored_document_ids`, not filenames. - Use relative paths with folder prefixes in `evidence`, for example `scans/example.png`. - Use only allowed lowercase warning codes from `README_TASK.md`. - Warning arrays must be flat arrays of strings.
purchase_orders.csv
po_id,vendor_id,limit_cents,status PO-4510,V-BP9200,20000,open PO-4577,V-BP9200,16000,open PO-4488,V-BP9200,5000,closed
vendor_master.csv
vendor_id,name,tax_id,status V-BP9200,BrightPath Office Solutions,BP-9200,active V-NW001,Northwind Office Supply,NW-CUSTOMER,customer