gemma-4-e2b
2026-05-14T19-44-19Z_google_gemma-4-e2b_paperwork_workflow_case06
0% Practical score
0/1 Resolved tasks
0/1 Core tasks
Verdict
practical run result failed
Core 0/1 · Visible 1/1 · Hidden 0/1
final_document_set_errormissing_or_wrong_evidencenormalized_text_errorproof_code_errorproof_txt_error
Prompt set v3_paperwork_workflow_case06.json
Run type local
Temperature 0
Top P 1
Missed tasks 1/1
Task Results
Compact view; open a row for prompt and raw answer| Task | Category | Score | Failed Checks | Failure Types | Prompt / Answer |
|---|---|---|---|---|---|
| remittance_split_case_06 | paperwork | 2/4 | core_oracle_pass, hidden_oracle_pass | final_document_set_error, missing_or_wrong_evidence, normalized_text_error, proof_code_error, proof_txt_error, total_calculation_error, warning_code_error | case files InspectPrompt Audit the synthetic remittance split intake folder in `incoming/`. Write these files: - `work/attachment_index.json` - `work/final_document_set.json` - `work/normalized/INV-3301.txt` - `work/normalized/INV-3302.txt` - `audit_result.json` - `proof.txt` Use `incoming/README_TASK.md` as the full task specification. Use the generated image attachments, the AP note, bank exports, vendor master, and purchase orders. Important: - Identify the active bank export and ignore the draft export. - Use remittance advice RA-771 to split the single bank payment across two final invoices. - Do not treat the proforma estimate as an invoice. - Do not modify anything under `incoming/`. - This is benchmark bookkeeping, not financial, tax, or legal advice. - `proof.txt` must contain only the numeric proof code. Workspace files available to the model README_TASK.md# Paperwork Workflow Case 06: Remittance Split
Audit this synthetic intake folder. This is benchmark bookkeeping, not financial or tax advice.
You must not modify anything under incoming/.
Case ID for audit_result.json: P3-WORK-06
Steps:
1. Identify active source files and ignore drafts, old exports, and non-invoice distractors.
2. Use the remittance advice to map the single bank payment to the final invoices.
3. Create work/attachment_index.json.
4. Create work/final_document_set.json.
5. Create normalized invoice text files under work/normalized/.
6. Create audit_result.json.
7. Create proof.txt containing only the final proof_code.
Required audit_result.json keys:
case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids,
ignored_document_ids, total_approved_gross_cents, warnings_by_invoice,
evidence, proof_code
Required audit_result.json values and formats:
- case_id must be "P3-WORK-06"
- ignored_document_ids must be ["PRO-3303"]
- warnings_by_invoice must use short warning codes only, not prose sentences
- evidence must be an array of source path strings, not a nested object
Proof code formula:
total_approved_gross_cents
+ numeric parts of approved/review/reject invoice IDs
+ numeric part of the active remittance batch
+ 97 * total warning count
Use invoice IDs only for approved/review/reject invoice IDs. Do not include proforma or remittance document IDs in the invoice-ID sum.
Required work/attachment_index.json shape:
{"attachments":[{"attachment_path":"...","document_id":"...","document_type":"...","use":"..."}]}
Required work/final_document_set.json shape:
{
"active_bank_export": "...",
"active_remittance_batch": "...",
"final_invoice_ids": [...],
"ignored_document_ids": [...],
"ignored_source_files": [...],
"payment_allocations": [{"invoice_id":"...","gross_total_cents":123}]
}
ap_notes.txtAP intake note, May batch: - Use bank_export_final.csv, not bank_export_draft.csv. - Vendor Bright Coast Office Supplies is active. - Remittance advice RA-771 explains payment BATCH-771. - The proforma is a quote-style document and should not be treated as an invoice. bank_export_draft.csvdate,description,amount_cents 2026-05-07,DRAFT BATCH 771 DO NOT USE,30000 bank_export_final.csvdate,description,amount_cents 2026-05-11,BATCH-771 BRIGHT COAST REMITTANCE,29730 purchase_orders.csvpo_id,vendor_id,status,approved_invoice_id PO-9001,V-BRIGHT,approved,INV-3301 PO-9002,V-BRIGHT,approved,INV-3302 PO-9003,V-BRIGHT,draft,PRO-3303 vendor_master.csvvendor_id,vendor_name,status V-BRIGHT,Bright Coast Office Supplies,active V-FAKE,Fake Sample Vendor,inactive Input image Agent actions list_files -> read_file -> list_files -> read_file -> read_file -> read_file -> read_file -> read_file -> read_file -> read_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> finish Model answer ## audit_result.json
{
"case_id": "P3-WORK-06",
"approved_invoice_ids": ["INV-3301", "INV-3302"],
"review_invoice_ids": [],
"reject_invoice_ids": [],
"ignored_document_ids": ["PRO-3303"],
"total_approved_gross_cents":30090,
"warnings_by_invoice": {
"INV-3301": "Split",
"INV-3302": "Split"
},
"evidence": [
"incoming/attachments/invoice_3301.png",
"incoming/attachments/invoice_3302.png",
"incoming/bank_export_final.csv"
],
"proof_code": "30090"
}
## proof.txt
30090
## work/attachment_index.json
{"attachments":[{"attachment_path":"incoming/attachments/invoice_3301.png","document_id":"INV-3301","document_type":"Invoice","use":"Approved"},{"attachment_path":"incoming/attachments/invoice_3302.png","document_id":"INV-3302","document_type":"Invoice","use":"Approved"},{"attachment_path":"incoming/attachments/proforma_3303.png","document_id":"PRO-3303","document_type":"Proforma","use":"Ignored"},{"attachment_path":"incoming/attachments/remittance_advice.png","document_id":"RA-771","document_type":"Remittance Advice","use":"Active"}]}
## work/final_document_set.json
{
"active_bank_export": "incoming/bank_export_final.csv",
"active_remittance_batch": "BATCH-771",
"final_invoice_ids": ["INV-3301", "INV-3302"],
"ignored_document_ids": ["PRO-3303"],
"ignored_source_files": ["incoming/bank_export_draft.csv"],
"payment_allocations": [
{"invoice_id":"INV-3301","gross_total_cents":18490},
{"invoice_id":"INV-3302","gross_total_cents":1240}
]
}
## work/normalized/INV-3301.txt
Invoice ID: INV-3301
Vendor ID: V-BRIGHT
PO: PO-9001
Status: This is a proforma estimate.
Do not book as an invoice.
## work/normalized/INV-3302.txt
Invoice ID: INV-3302
Vendor ID: V-BRIGHT
PO: PO-9002
Status: This is a proforma estimate.
Do not book as an invoice.
## verify_output.json
{
"ok": false,
"core_ok": false,
"visible_ok": true,
"failure_types": [
"final_document_set_error",
"missing_or_wrong_evidence",
"normalized_text_error",
"proof_code_error",
"proof_txt_error",
"total_calculation_error",
"warning_code_error"
],
"checks": [
{
"check": "audit_result_exists",
"passed": true,
"message": "Expected final audit_result.json with exact keys."
},
{
"check": "visible_checks_pass",
"passed": true,
"message": "Expected required artifacts and unchanged incoming/ sources."
},
{
"check": "core_oracle_pass",
"passed": false,
"message": "Expected core audit fields to match the oracle."
},
{
"check": "hidden_oracle_pass",
"passed": false,
"message": "Expected exact audit, artifacts, proof.txt, and unchanged sources."
}
],
"audit_error": "",
"actual": {
"case_id": "P3-WORK-06",
"approved_invoice_ids": [
"INV-3301",
"INV-3302"
],
"review_invoice_ids": [],
"reject_invoice_ids": [],
"ignored_document_ids": [
"PRO-3303"
],
"total_approved_gross_cents": 30090,
"warnings_by_invoice": {
"INV-3301": "Split",
"INV-3302": "Split"
},
"evidence": [
"incoming/attachments/invoice_3301.png",
"incoming/attachments/invoice_3302.png",
"incoming/bank_export_final.csv"
],
"proof_code": "30090"
},
"expected": {
"approved_invoice_ids": [
"INV-3301",
"INV-3302"
],
"case_id": "P3-WORK-06",
"evidence": [
"incoming/attachments/invoice_3301.png",
"incoming/attachments/invoice_3302.png",
"incoming/attachments/remittance_advice.png",
"incoming/bank_export_final.csv",
"incoming/purchase_orders.csv",
"incoming/vendor_master.csv"
],
"ignored_document_ids": [
"PRO-3303"
],
"proof_code": "37104",
"reject_invoice_ids": [],
"review_invoice_ids": [],
"total_approved_gross_cents": 29730,
"warnings_by_invoice": {
"INV-3301": [],
"INV-3302": []
}
}
}
|