European Warranty Insurer

Vehicle Warranty Insurance · 84 claims in pilot

Automating Vehicle Warranty Claims with AI Document Intelligence

From 18% to 98% Decision Accuracy in 7 Days

MIRA ContextBuilder ClaimEval

7 days from baseline to 98% accuracy

98%

Decision Accuracy

from 18% baseline

7 days

Time to Results

full pipeline live

80-90%

No-LLM Resolution

of line items

Languages

DE / FR / IT

The Problem

A vehicle warranty claim lands on an adjuster's desk: five documents, three languages, a cost estimate with 30 line items, a policy with coverage tiers, mileage-dependent reimbursement rates and component exclusion lists.

Processing takes 15-30 minutes per claim. Multiply that by hundreds of claims per month.

76% of denied claims fail for a single reason: the part is not covered by the policy - a deterministic lookup, not a judgment call.

Claims analyzed in pilot

50/50

Approval-to-denial split

CHF 1,450

Avg approved payout

Vehicle brands represented

Distinct document types

4.9

Avg documents per claim

Why Existing Approaches Fall Short

Most document processing tools extract text from PDFs - that gets you 10% of the way. The real challenge is cross-document reasoning: a cost estimate means nothing without the policy, a mileage reading means nothing without the coverage cap.

Generic OCR pipelines lack domain depth

Swiss repair invoices contain line items in German with part numbers, labor codes, carry-forward subtotals across pages. Standard OCR extracts text but not structured data with coverage implications.

Single-model approaches plateau early

Sending an entire claim to an LLM produces inconsistent results - hallucinated part names, invented coverage rules, unreliable financial calculations. Baseline LLM-only accuracy: 18%.

Rule-based systems cannot scale to vocabulary

Thousands of part names across multiple languages. A "Wasserpumpe" in German is a "pompe à eau" in French. Pure keyword matching breaks on the first synonym it has not seen.

The solution requires a hybrid architecture: deterministic rules where they work, keyword matching for known vocabulary and LLM reasoning as a calibrated fallback - each layer with explicit confidence scoring and provenance tracking.

The Architecture: A Multi-Stage Pipeline

An end-to-end claims processing pipeline moves raw documents through five stages, each with quality gates and audit logging.

Stage 1

Ingestion

Stage 2

Classification

Stage 3

Extraction

Stage 4

Screening

Stage 5

QA Review

Coverage Analysis: Three-Tier Matching

Each line item goes through a three-tier matching pipeline, from fastest/highest-confidence to slowest/lowest-confidence:

Tier 1Rule Engine

Confidence: 1.040-50%

Deterministic matches for fee items, known exclusions, consumables. Zero ambiguity, zero latency.

Tier 2Keyword Matcher

Confidence: 0.70-0.9030-40%

Maps German/French repair terms to 30+ component categories with synonyms and umlaut normalization.

Tier 3LLM Fallback

Confidence: 0.60-0.8510-20%

GPT-4o with structured prompts for genuinely ambiguous items. Concurrency-optimized with 10 parallel calls.

The Implementation Journey: From 18% to 98%

Days 1-318%

Baseline and Architecture

The initial LLM-only approach produced plausible-sounding but unreliable results: hallucinated part names, invented coverage rules, inconsistent calculations.

Key insight: LLMs are good at language understanding but bad at deterministic business rules. The architecture had to separate judgment from precision.

Days 3-576%

Screening + Coverage Pipeline

The 11-check screening pipeline and three-tier coverage matching drove the biggest single improvement. Deterministic checks alone caught most denials correctly.

Days 5-888% → 94% → 98%

Iteration and Refinement

60+ evaluation iterations solving increasingly subtle problems: substring matching bugs, labor demotion logic, and part-number normalization across OCR outputs.

Holdout Test76.7%

Unseen Data Validation

30 previously unseen claims revealed failure modes the development set did not cover: causal exclusion clauses, missing document scenarios and sub-component interpretation gaps.

The holdout gap (98% vs 77%) is the honest measure of generalization.

Key Highlights

14 document types classified at near-100% accuracy
51 fields extracted per warranty policy with confidence scoring
Full provenance chain from decision back to source page and character position
Three-tier coverage matching: deterministic rules, keyword synonyms, LLM fallback
Multilingual keyword matching across German, French and Italian
850+ unit tests covering the full pipeline

Products Used

MIRA ContextBuilder ClaimEval

Tech Stack

PythonFastAPIPydanticReact 18TypeScriptTailwind CSSGPT-4oAzure Doc Intelligence

Get Similar Results

Ready to Achieve Similar Results?

Start with a pilot on your closed claims and see the impact for yourself.

Request a Pilot