P03 The Paperwork Trial Generated invoice images

Duplicate Risk Mix

A larger mixed folder combining earlier-looking scans, a previous-invoices file, credit note, quote, and duplicate-risk lookup.

Transparency boundary

This page shows the files the model can inspect. It does not show the hidden oracle solution.

Excluded: ground_truth.json, expected_artifacts.json, manual readings, and calibration notes.

What This Tests

duplicate-risk lookupmixed foldersquote and credit note distractorspartial payment

Files Shown

8 generated image files and 6 text/CSV/task files.

All files are synthetic benchmark material.

Generated Images

Scans and visual attachments visible to the model
P03 input scans/case01_inv_7801_paid.png
scans/case01_inv_7801_paid.png
P03 input scans/case01_inv_7802_short_paid.png
scans/case01_inv_7802_short_paid.png
P03 input scans/case01_inv_8422_under_review.png
scans/case01_inv_8422_under_review.png
P03 input scans/case01_quote_6400_not_invoice.png
scans/case01_quote_6400_not_invoice.png
P03 input scans/case02_credit_note_10032_credit_applied.png
scans/case02_credit_note_10032_credit_applied.png
P03 input scans/case02_inv_82415_paid.png
scans/case02_inv_82415_paid.png
P03 input scans/case02_inv_82478_partial_payment.png
scans/case02_inv_82478_partial_payment.png
P03 input scans/case02_inv_82533_vendor_hold.png
scans/case02_inv_82533_vendor_hold.png

Task And Source Files

No oracle solution files
README_TASK.md
# Paperwork Trial v3 Calibration Case 03

This is synthetic benchmark bookkeeping. It is not financial, tax, or legal advice.

Audit the scanned document folder and write `audit_result.json` only.

Case ID: `P3-GEN-03`

Output keys exactly:

`case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids, ignored_document_ids, total_approved_gross_cents, warnings_by_invoice, evidence, proof_code`

Rules:

- Treat the files in `scans/` as the source documents.
- Ignore documents that are explicitly quotes, credit notes, or not invoice payment requests.
- Match vendors by visible vendor name and tax ID when available.
- `payment_match` requires paid bank rows for the invoice whose summed amount equals the invoice gross.
- `duplicate_risk` applies when `previous_invoices.csv` has the same vendor_id and same gross amount as the current invoice.
- `payment_short` applies when the paid bank amount is lower than invoice gross.
- `under_review_stamp` applies when the scanned invoice visibly has an under-review stamp.
- `missing_po` applies when the scanned invoice visibly has no valid PO number or says `MISSING PO`.
- `inactive_vendor` applies when the scanned invoice visibly has a vendor-hold/inactive-vendor stamp or vendor records mark the vendor inactive.
- Approved invoices have no warnings and paid amount equals gross.
- Review invoices have warnings but are not reject-level. `duplicate_risk`, `payment_short`, and `under_review_stamp` are review-level.
- Reject invoices with `missing_po` or `inactive_vendor`.
- Sort all invoice-id arrays ascending.
- Allowed warning codes are exactly: `duplicate_risk`, `inactive_vendor`, `missing_po`, `payment_short`, `under_review_stamp`.
- `warnings_by_invoice` must include every real invoice ID and sorted warning arrays.
- Warning arrays must be flat arrays of lowercase strings, never nested arrays and never prose labels.
- `ignored_document_ids` must include visible document IDs from ignored non-invoice documents, not filenames.
- `total_approved_gross_cents` is the sum of approved invoice gross totals only.
- `evidence` must list the relative source file paths used in stable alphabetical order, including folder prefixes such as `scans/`.
- Include the relevant CSV files and every scanned document inspected in `evidence`, including `previous_invoices.csv` and ignored quote and credit-note scans.
- `proof_code = total_approved_gross_cents + sum(numeric parts of all real invoice IDs) + 97 * total_warning_count`.

Important: `Northwind Office Supply` is the customer, not the vendor.
bank_export.csv
date,description,invoice_id,amount_cents,status
2026-04-23,BluePeak Distributors,INV-7801,18737,paid
2026-04-24,StationHub LLC,INV-7802,5000,paid
2026-04-27,Orion Field Services,INV-8422,42245,paid
2026-04-23,BrightPath Office Solutions,INV-82415,18737,paid
2026-05-02,BrightPath Office Solutions,INV-82478,10000,paid
2026-05-05,BrightPath Office Solutions,INV-82533,23794,pending
model_prompt.md
You are auditing a synthetic scanned paperwork folder.

Read `README_TASK.md`, inspect the files in `scans/`, and use `bank_export.csv`, `vendor_master.csv`, `purchase_orders.csv`, and `previous_invoices.csv`.

Write `audit_result.json` only.

This is benchmark bookkeeping, not financial, tax, or legal advice.

Important:
- The scanned images are the source documents.
- `Northwind Office Supply` is the customer, not the vendor.
- Ignore quotes, credit notes, and documents that are not invoice payment requests.
- Do not invent fields that are not supported by the files.
- Use visible document IDs for `ignored_document_ids`, not filenames.
- Use relative paths with folder prefixes in `evidence`, for example `scans/example.png`.
- Use only allowed lowercase warning codes from `README_TASK.md`.
- Warning arrays must be flat arrays of strings.
previous_invoices.csv
invoice_id,vendor_id,gross_total_cents,paid_date
INV-7600,V-BP4471,18737,2026-03-11
INV-82210,V-BP9200,3725,2026-04-18
purchase_orders.csv
po_id,vendor_id,limit_cents,status
PO-4510,V-BP4471,20000,open
PO-4510,V-BP9200,20000,open
PO-4510-B,V-SH9982,6000,open
PO-4577,V-BP9200,16000,open
PO-8422,V-OR1109,50000,open
PO-4488,V-BP9200,5000,closed
vendor_master.csv
vendor_id,name,tax_id,status
V-BP4471,BluePeak Distributors,BP-4471,active
V-SH9982,StationHub LLC,SH-9982,active
V-OR1109,Orion Field Services,OR-1109,active
V-BP9200,BrightPath Office Solutions,BP-9200,active
V-NW001,Northwind Office Supply,NW-CUSTOMER,customer