Automating Vehicle Warranty Claims with AI Document Intelligence
From 18% to 98% Decision Accuracy in 7 Days
The Problem
A vehicle warranty claim lands on an adjuster's desk: five documents, three languages, a cost estimate with 30 line items, a policy with coverage tiers, mileage-dependent reimbursement rates and component exclusion lists.
Processing takes 15-30 minutes per claim. Multiply that by hundreds of claims per month.
76% of denied claims fail for a single reason: the part is not covered by the policy - a deterministic lookup, not a judgment call.
Why Existing Approaches Fall Short
Most document processing tools extract text from PDFs - that gets you 10% of the way. The real challenge is cross-document reasoning: a cost estimate means nothing without the policy, a mileage reading means nothing without the coverage cap.
Generic OCR pipelines lack domain depth
Swiss repair invoices contain line items in German with part numbers, labor codes, carry-forward subtotals across pages. Standard OCR extracts text but not structured data with coverage implications.
Single-model approaches plateau early
Sending an entire claim to an LLM produces inconsistent results - hallucinated part names, invented coverage rules, unreliable financial calculations. Baseline LLM-only accuracy: 18%.
Rule-based systems cannot scale to vocabulary
Thousands of part names across multiple languages. A "Wasserpumpe" in German is a "pompe à eau" in French. Pure keyword matching breaks on the first synonym it has not seen.
The solution requires a hybrid architecture: deterministic rules where they work, keyword matching for known vocabulary and LLM reasoning as a calibrated fallback - each layer with explicit confidence scoring and provenance tracking.
The Architecture: A Multi-Stage Pipeline
An end-to-end claims processing pipeline moves raw documents through five stages, each with quality gates and audit logging.
Coverage Analysis: Three-Tier Matching
Each line item goes through a three-tier matching pipeline, from fastest/highest-confidence to slowest/lowest-confidence:
Deterministic matches for fee items, known exclusions, consumables. Zero ambiguity, zero latency.
Maps German/French repair terms to 30+ component categories with synonyms and umlaut normalization.
GPT-4o with structured prompts for genuinely ambiguous items. Concurrency-optimized with 10 parallel calls.
The Implementation Journey: From 18% to 98%
Baseline and Architecture
The initial LLM-only approach produced plausible-sounding but unreliable results: hallucinated part names, invented coverage rules, inconsistent calculations.
Screening + Coverage Pipeline
The 11-check screening pipeline and three-tier coverage matching drove the biggest single improvement. Deterministic checks alone caught most denials correctly.
Iteration and Refinement
60+ evaluation iterations solving increasingly subtle problems: substring matching bugs, labor demotion logic, and part-number normalization across OCR outputs.
Unseen Data Validation
30 previously unseen claims revealed failure modes the development set did not cover: causal exclusion clauses, missing document scenarios and sub-component interpretation gaps.
The holdout gap (98% vs 77%) is the honest measure of generalization.
Key Highlights
- 14 document types classified at near-100% accuracy
- 51 fields extracted per warranty policy with confidence scoring
- Full provenance chain from decision back to source page and character position
- Three-tier coverage matching: deterministic rules, keyword synonyms, LLM fallback
- Multilingual keyword matching across German, French and Italian
- 850+ unit tests covering the full pipeline
Products Used
Tech Stack
Ready to Achieve Similar Results?
Start with a pilot on your closed claims and see the impact for yourself.
Request a Pilot