ministral-3-14b
2026-05-15T13-45-01Z_ministral-3-14b_paperwork_workflow_case06
0% Practical score
0/1 Resolved tasks
0/1 Core tasks
Verdict
practical run result failed
Core 0/1 · Visible 0/1 · Hidden 0/1
attachment_index_errorfinal_document_set_errorno_outputnormalized_text_errorproof_txt_error
Prompt set v3_paperwork_workflow_case06.json
Run type local
Temperature 0
Top P 1
Missed tasks 1/1
Task Results
Compact view; open a row for prompt and raw answer| Task | Category | Score | Failed Checks | Failure Types | Prompt / Answer |
|---|---|---|---|---|---|
| remittance_split_case_06 | paperwork | 0/4 | audit_result_exists, visible_checks_pass, core_oracle_pass, hidden_oracle_pass | attachment_index_error, final_document_set_error, no_output, normalized_text_error, proof_txt_error, required_artifact_missing | case files InspectPrompt Audit the synthetic remittance split intake folder in `incoming/`. Write these files: - `work/attachment_index.json` - `work/final_document_set.json` - `work/normalized/INV-3301.txt` - `work/normalized/INV-3302.txt` - `audit_result.json` - `proof.txt` Use `incoming/README_TASK.md` as the full task specification. Use the generated image attachments, the AP note, bank exports, vendor master, and purchase orders. Important: - Identify the active bank export and ignore the draft export. - Use remittance advice RA-771 to split the single bank payment across two final invoices. - Do not treat the proforma estimate as an invoice. - Do not modify anything under `incoming/`. - This is benchmark bookkeeping, not financial, tax, or legal advice. - `proof.txt` must contain only the numeric proof code. Workspace files available to the model README_TASK.md# Paperwork Workflow Case 06: Remittance Split
Audit this synthetic intake folder. This is benchmark bookkeeping, not financial or tax advice.
You must not modify anything under incoming/.
Case ID for audit_result.json: P3-WORK-06
Steps:
1. Identify active source files and ignore drafts, old exports, and non-invoice distractors.
2. Use the remittance advice to map the single bank payment to the final invoices.
3. Create work/attachment_index.json.
4. Create work/final_document_set.json.
5. Create normalized invoice text files under work/normalized/.
6. Create audit_result.json.
7. Create proof.txt containing only the final proof_code.
Required audit_result.json keys:
case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids,
ignored_document_ids, total_approved_gross_cents, warnings_by_invoice,
evidence, proof_code
Required audit_result.json values and formats:
- case_id must be "P3-WORK-06"
- ignored_document_ids must be ["PRO-3303"]
- warnings_by_invoice must use short warning codes only, not prose sentences
- evidence must be an array of source path strings, not a nested object
Proof code formula:
total_approved_gross_cents
+ numeric parts of approved/review/reject invoice IDs
+ numeric part of the active remittance batch
+ 97 * total warning count
Use invoice IDs only for approved/review/reject invoice IDs. Do not include proforma or remittance document IDs in the invoice-ID sum.
Required work/attachment_index.json shape:
{"attachments":[{"attachment_path":"...","document_id":"...","document_type":"...","use":"..."}]}
Required work/final_document_set.json shape:
{
"active_bank_export": "...",
"active_remittance_batch": "...",
"final_invoice_ids": [...],
"ignored_document_ids": [...],
"ignored_source_files": [...],
"payment_allocations": [{"invoice_id":"...","gross_total_cents":123}]
}
ap_notes.txtAP intake note, May batch: - Use bank_export_final.csv, not bank_export_draft.csv. - Vendor Bright Coast Office Supplies is active. - Remittance advice RA-771 explains payment BATCH-771. - The proforma is a quote-style document and should not be treated as an invoice. bank_export_draft.csvdate,description,amount_cents 2026-05-07,DRAFT BATCH 771 DO NOT USE,30000 bank_export_final.csvdate,description,amount_cents 2026-05-11,BATCH-771 BRIGHT COAST REMITTANCE,29730 purchase_orders.csvpo_id,vendor_id,status,approved_invoice_id PO-9001,V-BRIGHT,approved,INV-3301 PO-9002,V-BRIGHT,approved,INV-3302 PO-9003,V-BRIGHT,draft,PRO-3303 vendor_master.csvvendor_id,vendor_name,status V-BRIGHT,Bright Coast Office Supplies,active V-FAKE,Fake Sample Vendor,inactive Input images
open case files and generated images Agent actions read_file -> list_files -> read_file -> read_file -> read_file -> read_file Model answer ## audit_result.json
[missing]
## proof.txt
[missing]
## work/attachment_index.json
[missing]
## work/final_document_set.json
[missing]
## work/normalized/INV-3301.txt
[missing]
## work/normalized/INV-3302.txt
[missing]
## verify_output.json
{
"ok": false,
"core_ok": false,
"visible_ok": false,
"failure_types": [
"attachment_index_error",
"final_document_set_error",
"no_output",
"normalized_text_error",
"proof_txt_error",
"required_artifact_missing"
],
"checks": [
{
"check": "audit_result_exists",
"passed": false,
"message": "Expected final audit_result.json with exact keys."
},
{
"check": "visible_checks_pass",
"passed": false,
"message": "Expected required artifacts and unchanged incoming/ sources."
},
{
"check": "core_oracle_pass",
"passed": false,
"message": "Expected core audit fields to match the oracle."
},
{
"check": "hidden_oracle_pass",
"passed": false,
"message": "Expected exact audit, artifacts, proof.txt, and unchanged sources."
}
],
"audit_error": "missing_audit_result",
"actual": null,
"expected": {
"approved_invoice_ids": [
"INV-3301",
"INV-3302"
],
"case_id": "P3-WORK-06",
"evidence": [
"incoming/attachments/invoice_3301.png",
"incoming/attachments/invoice_3302.png",
"incoming/attachments/remittance_advice.png",
"incoming/bank_export_final.csv",
"incoming/purchase_orders.csv",
"incoming/vendor_master.csv"
],
"ignored_document_ids": [
"PRO-3303"
],
"proof_code": "37104",
"reject_invoice_ids": [],
"review_invoice_ids": [],
"total_approved_gross_cents": 29730,
"warnings_by_invoice": {
"INV-3301": [],
"INV-3302": []
}
}
}
|