ministral-3-14b

2026-05-15T13-31-42Z_ministral-3-14b_paperwork_workflow_case04

0% Practical score
0/1 Resolved tasks
0/1 Core tasks

Verdict

practical run result
failed

Core 0/1 · Visible 1/1 · Hidden 0/1

invoice_classification_errormanifest_errornormalized_text_errorproof_code_errorproof_txt_error
Prompt set v3_paperwork_workflow_case04.json
Run type local
Temperature 0
Top P 1
Missed tasks 1/1

Task Results

Compact view; open a row for prompt and raw answer
Task Category Score Failed Checks Failure Types Prompt / Answer
messy_intake_workflow_case_04 paperwork 2/4 core_oracle_pass, hidden_oracle_pass invoice_classification_error, manifest_error, normalized_text_error, proof_code_error, proof_txt_error, total_calculation_error, warning_code_error case files
Inspect

Prompt

You are in a synthetic paperwork benchmark workspace.

Audit the messy intake folder in `incoming/`.

Create these final artifacts:
- `work/normalized_manifest.json`
- `work/document_index.json`
- `work/normalized/INV-9107.txt`
- `work/normalized/INV-9108.txt`
- `work/normalized/INV-9109.txt`
- `audit_result.json`
- `proof.txt`

Rules:
- Identify the active source files.
- Ignore drafts, old exports, duplicate vendor files, and non-invoice docs.
- Do not modify anything in `incoming/`.
- Return no advice. This is benchmark bookkeeping, not financial, tax, or legal advice.
- `proof.txt` must contain only the final numeric `proof_code`.

Artifact schemas:
- `work/normalized_manifest.json` must be one JSON object with keys `case_id`, `active_files`, `ignored_files`, `normalized_files`.
- `work/document_index.json` must be one JSON object with keys `case_id`, `documents`.
- Each `documents` entry must use keys `document_id`, `document_type`, `source_path`, `normalized_path`.
- Normalized invoice text files must use exactly this line format:
  - `invoice_id=...`
  - `vendor_id=...`
  - `vendor_name=...`
  - `po_id=...`
  - `gross_total_cents=...`
- `warnings_by_invoice` must include every real invoice ID. Use an empty array when an invoice has no warnings.

Workspace files available to the model

README_TASK.md
# Paperwork Workflow Case 04: Messy Intake Folder

Case ID: P3-WORK-04

You are auditing a messy synthetic intake folder. Some files are active sources, some are stale drafts, and one scan is not an invoice.

Create:
- `work/normalized_manifest.json`
- `work/document_index.json`
- `work/normalized/INV-9107.txt`
- `work/normalized/INV-9108.txt`
- `work/normalized/INV-9109.txt`
- `audit_result.json`
- `proof.txt`

`work/normalized_manifest.json` schema:

```json
{
  "case_id": "P3-WORK-04",
  "active_files": [],
  "ignored_files": [],
  "normalized_files": []
}
```

`active_files` should list the active final sources used for the audit. `ignored_files` should list stale drafts, old exports, duplicate vendor files, and non-invoice scans. `normalized_files` should list the normalized invoice text files you create.

`work/document_index.json` schema:

```json
{
  "case_id": "P3-WORK-04",
  "documents": [
    {
      "document_id": "",
      "document_type": "",
      "source_path": "",
      "normalized_path": ""
    }
  ]
}
```

Use `document_type` values `invoice` or `credit_note`. For ignored non-invoice documents, set `normalized_path` to an empty string.

Each normalized invoice file must use exactly these five lines:

```text
invoice_id=...
vendor_id=...
vendor_name=...
po_id=...
gross_total_cents=...
```

Use the final bank export, the active vendor master, and the final purchase order list:
- final bank export: `incoming/bank-may-final.csv`
- active vendor master: `incoming/vendor_master.csv`
- final purchase orders: `incoming/purchase_orders.csv`

Ignore:
- `incoming/bank_export_old.csv`
- `incoming/vendors copy.csv`
- `incoming/po-list-draft.csv`
- the non-invoice document scan

The scan files contain the source document text. If your tool returns OCR text for an image, use it as the scan reading and cross-check it against the CSV files.

`audit_result.json` must contain exactly these keys:
- case_id
- approved_invoice_ids
- review_invoice_ids
- reject_invoice_ids
- ignored_document_ids
- total_approved_gross_cents
- warnings_by_invoice
- evidence
- proof_code

`warnings_by_invoice` must include every real invoice ID. Use an empty array when an invoice has no warnings.

Allowed warning codes:
- payment_short
- missing_payment
- missing_po
- inactive_vendor

Rules:
- Approve only active-vendor invoices with matching PO and exact final-bank payment.
- Put invoices with short payment, missing payment, missing PO, or inactive vendor into review.
- Reject only when a document is clearly invalid as an invoice. Non-invoice documents belong in `ignored_document_ids`.
- `total_approved_gross_cents` is the sum of approved invoice gross totals only.
- `evidence` must list the active source files used for the audit, with paths relative to the workspace.

Proof code formula:

`proof_code = total_approved_gross_cents + sum(numeric parts of all real invoice IDs) + 97 * total_warning_count`

`proof.txt` must contain only the proof code number and nothing else.
bank-may-final.csv
date,description,amount_cents,reference
2026-05-03,ALPHA DESK SYSTEMS INV-9107,-11900,INV-9107
2026-05-04,BETA OFFICE SUPPLY INV-9108,-23055,INV-9108
2026-05-06,ORION LEGACY PARTS PAYMENT,-6400,INV-9099

bank_export_old.csv
date,description,amount_cents,reference
2026-04-28,OLD TEST EXPORT DO NOT USE,-24855,INV-9108
2026-04-29,OLD ORION TEST EXPORT,-8740,INV-9109

notes_from_ap.txt
Use bank-may-final.csv, not the old export.
Use vendor_master.csv, not the copy.
Use purchase_orders.csv, not the draft list.
The credit note scan is not an invoice for this audit.
INV-9109 should not be approved while the vendor is inactive and the listed PO is cancelled.

po-list-draft.csv
po_id,vendor_id,gross_limit_cents,status
PO-7001,V-321,12500,open
PO-7002,V-654,25000,cancelled
PO-7003,V-777,9000,open

purchase_orders.csv
po_id,vendor_id,gross_limit_cents,status
PO-7001,V-321,12500,open
PO-7002,V-654,25000,open
PO-7003,V-777,9000,cancelled

vendor_master.csv
vendor_id,vendor_name,tax_id,status
V-321,Alpha Desk Systems,TX-321,active
V-654,Beta Office Supply,TX-654,active
V-777,Orion Legacy Parts,TX-777,inactive

vendors copy.csv
vendor_id,vendor_name,tax_id,status
V-321,Alpha Desk Systems,TX-321,active
V-654,Beta Office Supply,TX-654,inactive
V-777,Orion Legacy Parts,TX-777,active

Input images

  • quote_or_credit_note.png
  • scan_001.png
  • scan_002.png
  • scan_003.png

open case files and generated images

Agent actions

list_files -> read_file -> read_file -> read_file -> read_file -> mkdir -> write_file -> write_file -> write_file -> run_checks -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> write_file -> run_checks -> finish

Model answer

## audit_result.json

{
  "case_id": "P3-WORK-04",
  "approved_invoice_ids": [],
  "review_invoice_ids": [
    "INV-9107",
    "INV-9108"
  ],
  "reject_invoice_ids": [
    "INV-9109"
  ],
  "ignored_document_ids": [
    "CN-7710"
  ],
  "total_approved_gross_cents": 0,
  "warnings_by_invoice": {
    "INV-9107": [
      "missing_payment"
    ],
    "INV-9108": [
      "payment_short"
    ]
  },
  "evidence": [
    {
      "file": "incoming/bank-may-final.csv",
      "type": "final_bank_export"
    },
    {
      "file": "incoming/vendor_master.csv",
      "type": "active_vendor_master"
    },
    {
      "file": "incoming/purchase_orders.csv",
      "type": "final_purchase_orders"
    }
  ],
  "proof_code": 30795
}

## proof.txt

30795

## work/document_index.json

{
  "case_id": "P3-WORK-04",
  "documents": [
    {
      "document_id": "INV-9107",
      "document_type": "invoice",
      "source_path": "incoming/scan_001.png",
      "normalized_path": "work/normalized/INV-9107.txt"
    },
    {
      "document_id": "INV-9108",
      "document_type": "invoice",
      "source_path": "incoming/scan_002.png",
      "normalized_path": "work/normalized/INV-9108.txt"
    },
    {
      "document_id": "INV-9109",
      "document_type": "invoice",
      "source_path": "incoming/scan_003.png",
      "normalized_path": "work/normalized/INV-9109.txt"
    },
    {
      "document_id": "CN-7710",
      "document_type": "credit_note",
      "source_path": "incoming/quote_or_credit_note.png",
      "normalized_path": ""
    }
  ]
}

## work/normalized/INV-9107.txt

invoice_id=INV-9107
vendor_id=V-321
vendor_name=Alpha Desk Systems
po_id=PO-7001
gross_total_cents=11900

## work/normalized/INV-9108.txt

invoice_id=INV-9108
vendor_id=V-654
vendor_name=Beta Office Supply
po_id=PO-7002
gross_total_cents=23055

## work/normalized/INV-9109.txt

invoice_id=INV-9109
vendor_id=V-777
vendor_name=Orion Legacy Parts
po_id=
missing_po=true
gross_total_cents=8740

## work/normalized_manifest.json

{
  "case_id": "P3-WORK-04",
  "active_files": [
    {
      "file": "incoming/bank-may-final.csv",
      "type": "final_bank_export"
    },
    {
      "file": "incoming/vendor_master.csv",
      "type": "active_vendor_master"
    },
    {
      "file": "incoming/purchase_orders.csv",
      "type": "final_purchase_orders"
    }
  ],
  "ignored_files": [
    {
      "file": "incoming/bank_export_old.csv",
      "reason": "stale_export"
    },
    {
      "file": "incoming/vendors copy.csv",
      "reason": "duplicate_vendor_file"
    },
    {
      "file": "incoming/po-list-draft.csv",
      "reason": "draft_po_list"
    }
  ],
  "normalized_files": [
    {
      "path": "work/normalized/INV-9107.txt",
      "invoice_id": "INV-9107"
    },
    {
      "path": "work/normalized/INV-9108.txt",
      "invoice_id": "INV-9108"
    },
    {
      "path": "work/normalized/INV-9109.txt",
      "invoice_id": "INV-9109"
    }
  ]
}

## verify_output.json

{
  "ok": false,
  "core_ok": false,
  "visible_ok": true,
  "failure_types": [
    "invoice_classification_error",
    "manifest_error",
    "normalized_text_error",
    "proof_code_error",
    "proof_txt_error",
    "total_calculation_error",
    "warning_code_error"
  ],
  "checks": [
    {
      "check": "audit_result_exists",
      "passed": true,
      "message": "Expected final audit_result.json with exact keys."
    },
    {
      "check": "visible_checks_pass",
      "passed": true,
      "message": "Expected required artifacts and unchanged incoming/ sources."
    },
    {
      "check": "core_oracle_pass",
      "passed": false,
      "message": "Expected core audit fields to match the oracle."
    },
    {
      "check": "hidden_oracle_pass",
      "passed": false,
      "message": "Expected exact audit, artifacts, proof.txt, and unchanged sources."
    }
  ],
  "audit_error": "",
  "actual": {
    "case_id": "P3-WORK-04",
    "approved_invoice_ids": [],
    "review_invoice_ids": [
      "INV-9107",
      "INV-9108"
    ],
    "reject_invoice_ids": [
      "INV-9109"
    ],
    "ignored_document_ids": [
      "CN-7710"
    ],
    "total_approved_gross_cents": 0,
    "warnings_by_invoice": {
      "INV-9107": [
        "missing_payment"
      ],
      "INV-9108": [
        "payment_short"
      ]
    },
    "evidence": [
      {
        "file": "incoming/bank-may-final.csv",
        "type": "final_bank_export"
      },
      {
        "file": "incoming/vendor_master.csv",
        "type": "active_vendor_master"
      },
      {
        "file": "incoming/purchase_orders.csv",
        "type": "final_purchase_orders"
      }
    ],
    "proof_code": 30795
  },
  "expected": {
    "case_id": "P3-WORK-04",
    "approved_invoice_ids": [
      "INV-9107"
    ],
    "review_invoice_ids": [
      "INV-9108",
      "INV-9109"
    ],
    "reject_invoice_ids": [],
    "ignored_document_ids": [
      "CN-7710"
    ],
    "total_approved_gross_cents": 11900,
    "warnings_by_invoice": {
      "INV-9107": [],
      "INV-9108": [
        "payment_short"
      ],
      "INV-9109": [
        "inactive_vendor",
        "missing_payment",
        "missing_po"
      ]
    },
    "evidence": [
      "incoming/scan_001.png",
      "incoming/scan_002.png",
      "incoming/scan_003.png",
      "incoming/bank-may-final.csv",
      "incoming/vendor_master.csv",
      "incoming/purchase_orders.csv"
    ],
    "proof_code": 39612
  }
}