ministral-3-14b

2026-05-15T13-45-01Z_ministral-3-14b_paperwork_workflow_case06

0% Practical score
0/1 Resolved tasks
0/1 Core tasks

Verdict

practical run result
failed

Core 0/1 · Visible 0/1 · Hidden 0/1

attachment_index_errorfinal_document_set_errorno_outputnormalized_text_errorproof_txt_error
Prompt set v3_paperwork_workflow_case06.json
Run type local
Temperature 0
Top P 1
Missed tasks 1/1

Task Results

Compact view; open a row for prompt and raw answer
Task Category Score Failed Checks Failure Types Prompt / Answer
remittance_split_case_06 paperwork 0/4 audit_result_exists, visible_checks_pass, core_oracle_pass, hidden_oracle_pass attachment_index_error, final_document_set_error, no_output, normalized_text_error, proof_txt_error, required_artifact_missing case files
Inspect

Prompt

Audit the synthetic remittance split intake folder in `incoming/`.

Write these files:
- `work/attachment_index.json`
- `work/final_document_set.json`
- `work/normalized/INV-3301.txt`
- `work/normalized/INV-3302.txt`
- `audit_result.json`
- `proof.txt`

Use `incoming/README_TASK.md` as the full task specification. Use the generated image attachments, the AP note, bank exports, vendor master, and purchase orders.

Important:
- Identify the active bank export and ignore the draft export.
- Use remittance advice RA-771 to split the single bank payment across two final invoices.
- Do not treat the proforma estimate as an invoice.
- Do not modify anything under `incoming/`.
- This is benchmark bookkeeping, not financial, tax, or legal advice.
- `proof.txt` must contain only the numeric proof code.

Workspace files available to the model

README_TASK.md
# Paperwork Workflow Case 06: Remittance Split

Audit this synthetic intake folder. This is benchmark bookkeeping, not financial or tax advice.

You must not modify anything under incoming/.

Case ID for audit_result.json: P3-WORK-06

Steps:
1. Identify active source files and ignore drafts, old exports, and non-invoice distractors.
2. Use the remittance advice to map the single bank payment to the final invoices.
3. Create work/attachment_index.json.
4. Create work/final_document_set.json.
5. Create normalized invoice text files under work/normalized/.
6. Create audit_result.json.
7. Create proof.txt containing only the final proof_code.

Required audit_result.json keys:
case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids,
ignored_document_ids, total_approved_gross_cents, warnings_by_invoice,
evidence, proof_code

Required audit_result.json values and formats:
- case_id must be "P3-WORK-06"
- ignored_document_ids must be ["PRO-3303"]
- warnings_by_invoice must use short warning codes only, not prose sentences
- evidence must be an array of source path strings, not a nested object

Proof code formula:
total_approved_gross_cents
+ numeric parts of approved/review/reject invoice IDs
+ numeric part of the active remittance batch
+ 97 * total warning count

Use invoice IDs only for approved/review/reject invoice IDs. Do not include proforma or remittance document IDs in the invoice-ID sum.

Required work/attachment_index.json shape:
{"attachments":[{"attachment_path":"...","document_id":"...","document_type":"...","use":"..."}]}

Required work/final_document_set.json shape:
{
  "active_bank_export": "...",
  "active_remittance_batch": "...",
  "final_invoice_ids": [...],
  "ignored_document_ids": [...],
  "ignored_source_files": [...],
  "payment_allocations": [{"invoice_id":"...","gross_total_cents":123}]
}
ap_notes.txt
AP intake note, May batch:
- Use bank_export_final.csv, not bank_export_draft.csv.
- Vendor Bright Coast Office Supplies is active.
- Remittance advice RA-771 explains payment BATCH-771.
- The proforma is a quote-style document and should not be treated as an invoice.
bank_export_draft.csv
date,description,amount_cents
2026-05-07,DRAFT BATCH 771 DO NOT USE,30000
bank_export_final.csv
date,description,amount_cents
2026-05-11,BATCH-771 BRIGHT COAST REMITTANCE,29730
purchase_orders.csv
po_id,vendor_id,status,approved_invoice_id
PO-9001,V-BRIGHT,approved,INV-3301
PO-9002,V-BRIGHT,approved,INV-3302
PO-9003,V-BRIGHT,draft,PRO-3303
vendor_master.csv
vendor_id,vendor_name,status
V-BRIGHT,Bright Coast Office Supplies,active
V-FAKE,Fake Sample Vendor,inactive

Input images

  • invoice_3301.png
  • invoice_3302.png
  • proforma_3303.png
  • remittance_advice.png

open case files and generated images

Agent actions

read_file -> list_files -> read_file -> read_file -> read_file -> read_file

Model answer

## audit_result.json

[missing]

## proof.txt

[missing]

## work/attachment_index.json

[missing]

## work/final_document_set.json

[missing]

## work/normalized/INV-3301.txt

[missing]

## work/normalized/INV-3302.txt

[missing]

## verify_output.json

{
  "ok": false,
  "core_ok": false,
  "visible_ok": false,
  "failure_types": [
    "attachment_index_error",
    "final_document_set_error",
    "no_output",
    "normalized_text_error",
    "proof_txt_error",
    "required_artifact_missing"
  ],
  "checks": [
    {
      "check": "audit_result_exists",
      "passed": false,
      "message": "Expected final audit_result.json with exact keys."
    },
    {
      "check": "visible_checks_pass",
      "passed": false,
      "message": "Expected required artifacts and unchanged incoming/ sources."
    },
    {
      "check": "core_oracle_pass",
      "passed": false,
      "message": "Expected core audit fields to match the oracle."
    },
    {
      "check": "hidden_oracle_pass",
      "passed": false,
      "message": "Expected exact audit, artifacts, proof.txt, and unchanged sources."
    }
  ],
  "audit_error": "missing_audit_result",
  "actual": null,
  "expected": {
    "approved_invoice_ids": [
      "INV-3301",
      "INV-3302"
    ],
    "case_id": "P3-WORK-06",
    "evidence": [
      "incoming/attachments/invoice_3301.png",
      "incoming/attachments/invoice_3302.png",
      "incoming/attachments/remittance_advice.png",
      "incoming/bank_export_final.csv",
      "incoming/purchase_orders.csv",
      "incoming/vendor_master.csv"
    ],
    "ignored_document_ids": [
      "PRO-3303"
    ],
    "proof_code": "37104",
    "reject_invoice_ids": [],
    "review_invoice_ids": [],
    "total_approved_gross_cents": 29730,
    "warnings_by_invoice": {
      "INV-3301": [],
      "INV-3302": []
    }
  }
}