ministral-3-3b

2026-05-15T09-37-23Z_mistralai_ministral-3-3b_paperwork_workflow_case07

0% Practical score
0/1 Resolved tasks
0/1 Core tasks

Verdict

practical run result
failed

Core 0/1 · Visible 0/1 · Hidden 0/1

attachment_index_errorfinal_document_set_errorno_outputnormalized_text_errorpayment_reconciliation_error
Prompt set v3_paperwork_workflow_case07.json
Run type local
Temperature 0
Top P 1
Missed tasks 1/1

Task Results

Compact view; open a row for prompt and raw answer
Task Category Score Failed Checks Failure Types Prompt / Answer
credit_offset_case_07 paperwork 0/4 audit_result_exists, visible_checks_pass, core_oracle_pass, hidden_oracle_pass attachment_index_error, final_document_set_error, no_output, normalized_text_error, payment_reconciliation_error, proof_txt_error, required_artifact_missing case files
Inspect

Prompt

Audit the synthetic credit offset intake folder in `incoming/`.

Write these files:
- `work/attachment_index.json`
- `work/final_document_set.json`
- `work/payment_reconciliation.json`
- `work/normalized/INV-4409.txt`
- `work/normalized/INV-4411.txt`
- `audit_result.json`
- `proof.txt`

Use `incoming/README_TASK.md` as the full task specification. Use the generated image attachments, AP note, bank export, vendor master, and purchase orders.

Important:
- Use credit memo CM-4410 as payment-offset evidence, not as an invoice.
- Ignore the duplicate scan of INV-4409.
- Reject INV-4411 because its vendor is inactive and its PO is cancelled.
- Do not modify anything under `incoming/`.
- This is benchmark bookkeeping, not financial, tax, or legal advice.
- `proof.txt` must contain only the numeric proof code.

Workspace files available to the model

README_TASK.md
# Paperwork Workflow Case 07: Credit Offset Packet

Audit this synthetic intake folder. This is benchmark bookkeeping, not financial or tax advice.

You must not modify anything under incoming/.

Case ID for audit_result.json: P3-WORK-07

Steps:
1. Identify active source files and ignore duplicate scans, statements, and credit memos as non-invoice documents.
2. Use the credit memo to explain why the bank payment is lower than the approved invoice gross total.
3. Reject invoices from inactive vendors or cancelled purchase orders.
4. Create work/attachment_index.json.
5. Create work/final_document_set.json.
6. Create work/payment_reconciliation.json.
7. Create normalized invoice text files under work/normalized/.
8. Create audit_result.json.
9. Create proof.txt containing only the final proof_code.

Required audit_result.json keys:
case_id, approved_invoice_ids, review_invoice_ids, reject_invoice_ids,
ignored_document_ids, total_approved_gross_cents, warnings_by_invoice,
evidence, proof_code

Required audit_result.json values and formats:
- case_id must be "P3-WORK-07"
- ignored_document_ids must be ["CM-4410", "INV-4409-DUP", "STATEMENT-JUN"]
- warnings_by_invoice must use only these short warning codes: inactive_vendor, cancelled_po
- evidence must be an array of source path strings, not a nested object

Proof code formula:
total_approved_gross_cents
+ numeric parts of approved/review/reject invoice IDs
+ numeric part of the credit memo ID
+ 97 * total warning count

Use invoice IDs only for approved/review/reject invoice IDs. Do not include duplicate scans, credit memos, or statement IDs in the invoice-ID sum.

Required work/attachment_index.json shape:
{"attachments":[{"attachment_path":"...","document_id":"...","document_type":"...","use":"..."}]}

Required work/final_document_set.json shape:
{
  "approved_invoice_ids": [...],
  "credit_document_ids": [...],
  "ignored_document_ids": [...],
  "rejected_invoice_ids": [...],
  "source_bank_export": "..."
}

Required work/payment_reconciliation.json shape:
{
  "bank_payment_cents": 123,
  "credit_offset_cents": 123,
  "gross_invoice_cents": 123,
  "matched_invoice_id": "...",
  "payment_reference": "..."
}
ap_notes.txt
AP note:
- INV-4409 is the final invoice for Flare Tooling.
- CM-4410 is a credit memo and reduces the payment, but it is not an invoice.
- invoice_4409_duplicate_scan.png is a duplicate scan of INV-4409.
- INV-4411 belongs to a vendor on hold and PO-7788 is cancelled.
bank_export_june.csv
date,description,amount_cents
2026-06-03,PAY FLARE INV-4409 LESS CM-4410,28000
purchase_orders.csv
po_id,vendor_id,status,approved_invoice_id
PO-7701,V-FLARE,approved,INV-4409
PO-7788,V-OLD,cancelled,INV-4411
vendor_master.csv
vendor_id,vendor_name,status
V-FLARE,Flare Tooling,active
V-OLD,Old Gate Parts,inactive

Input image

credit_offset_case_07 input
credit_memo_4410.png
credit_offset_case_07 input
invoice_4409.png
credit_offset_case_07 input
invoice_4409_duplicate_scan.png
credit_offset_case_07 input
invoice_4411_cancelled_po.png
credit_offset_case_07 input
statement_page.png

Agent actions

list_files -> read_file -> mkdir -> mkdir -> write_file -> read_file -> read_file -> write_file -> write_file

Model answer

## audit_result.json

[missing]

## proof.txt

[missing]

## work/attachment_index.json

[
  {
    "attachment_path": "incoming/attachments/invoice_4409.png",
    "document_id": "INV-4409",
    "document_type": "invoice",
    "use": "approved"
  },
  {
    "attachment_path": "incoming/attachments/credit_memo_4410.png",
    "document_id": "CM-4410",
    "document_type": "credit memo",
    "use": "offset evidence"
  },
  {
    "attachment_path": "incoming/attachments/invoice_4411_cancelled_po.png",
    "document_id": "INV-4411",
    "document_type": "invoice",
    "use": "rejected (cancelled PO)"
  }
]

## work/final_document_set.json

{
  "approved_invoice_ids": [
    "INV-4409"
  ],
  "credit_document_ids": [
    "CM-4410"
  ],
  "ignored_document_ids": [
    "INV-4409-DUP",
    "STATEMENT-JUN"
  ],
  "rejected_invoice_ids": [
    "INV-4411"
  ],
  "source_bank_export": "incoming/bank_export_june.csv"
}

## work/normalized/INV-4409.txt

[missing]

## work/normalized/INV-4411.txt

[missing]

## work/payment_reconciliation.json

{
  "bank_payment_cents": 28000,
  "credit_offset_cents": 4100,
  "gross_invoice_cents": 32100,
  "matched_invoice_id": "INV-4409",
  "payment_reference": "CM-4410"
}

## verify_output.json

{
  "ok": false,
  "core_ok": false,
  "visible_ok": false,
  "failure_types": [
    "attachment_index_error",
    "final_document_set_error",
    "no_output",
    "normalized_text_error",
    "payment_reconciliation_error",
    "proof_txt_error",
    "required_artifact_missing"
  ],
  "checks": [
    {
      "check": "audit_result_exists",
      "passed": false,
      "message": "Expected final audit_result.json with exact keys."
    },
    {
      "check": "visible_checks_pass",
      "passed": false,
      "message": "Expected required artifacts and unchanged incoming/ sources."
    },
    {
      "check": "core_oracle_pass",
      "passed": false,
      "message": "Expected core audit fields to match the oracle."
    },
    {
      "check": "hidden_oracle_pass",
      "passed": false,
      "message": "Expected exact audit, artifacts, proof.txt, and unchanged sources."
    }
  ],
  "audit_error": "missing_audit_result",
  "actual": null,
  "expected": {
    "approved_invoice_ids": [
      "INV-4409"
    ],
    "case_id": "P3-WORK-07",
    "evidence": [
      "incoming/attachments/invoice_4409.png",
      "incoming/attachments/credit_memo_4410.png",
      "incoming/attachments/invoice_4411_cancelled_po.png",
      "incoming/bank_export_june.csv",
      "incoming/purchase_orders.csv",
      "incoming/vendor_master.csv"
    ],
    "ignored_document_ids": [
      "CM-4410",
      "INV-4409-DUP",
      "STATEMENT-JUN"
    ],
    "proof_code": "45524",
    "reject_invoice_ids": [
      "INV-4411"
    ],
    "review_invoice_ids": [],
    "total_approved_gross_cents": 32100,
    "warnings_by_invoice": {
      "INV-4409": [],
      "INV-4411": [
        "inactive_vendor",
        "cancelled_po"
      ]
    }
  }
}