Skip to main content
All Case Studies
European Warranty Provider
Vehicle Warranty Insurance ·

Automating Vehicle Warranty Claims with AI Document Intelligence

Multilingual, Auditable Warranty Adjudication

MIRAContextBuilderClaimEval
7 days from baseline to 98% accuracy
98%
Decision Accuracy
on development set
7 days
Time to Results
full pipeline live
80-90%
No-LLM Resolution
of line items
3
Languages
DE / FR / IT

Executive Summary

A European vehicle warranty provider needed to automate the adjudication of multilingual warranty claims. Each claim involved up to five documents in three languages (German, French, Italian), with processing taking more than 30 minutes on average per claim due to manual cross-referencing of cost estimates, coverage tiers, mileage caps and component exclusion lists.

True Aim deployed the full product stack: ContextBuilder for document ingestion and structured extraction, MIRA for hybrid deterministic and AI coverage matching, and ClaimEval for quality assurance with full provenance tracking. A multi-tier matching architecture routes 80%-90% of line items through deterministic rules and keyword matching with zero LLM cost, reserving AI reasoning for genuinely ambiguous cases. Within seven days and through multiple evaluation iterations, decision accuracy reached 98% on the development set of holdout data.

98% decision accuracy in 7 days across multiple evaluation iterations
80-90% of line items resolved without LLM involvement
Three languages (DE/FR/IT) with no measurable language bias
Full provenance chain from every decision back to source document and page

Operational Context

A vehicle warranty claim lands on an adjuster's desk: five documents, three languages, a cost estimate with 30 line items, a policy with coverage tiers, mileage-dependent reimbursement rates and component exclusion lists.

Standard processing takes 30+ minutes per claim. Multiply that by thousands of claims per month.

76% of denied claims fail for a single reason: the part is not covered by the policy - a deterministic lookup, not a judgment call.

50/50
Approval-to-denial split
CHF 1,450
Avg approved payout
23
Vehicle brands represented
14
Distinct document types
4.9
Avg documents per claim

Architectural Approach

Most document processing tools extract text from PDFs - that gets you 10% of the way. The real challenge is cross-document reasoning: a cost estimate means nothing without the policy, a mileage reading means nothing without the coverage cap.

Generic OCR pipelines lack domain depth

Swiss repair invoices contain line items in German with part numbers, labor codes, carry-forward subtotals across pages. Standard OCR extracts text but not structured data with coverage implications.

Single-model approaches plateau early

Sending an entire claim to an LLM produces inconsistent results - hallucinated part names, invented coverage rules, unreliable financial calculations.

Rule-based systems cannot scale to vocabulary

Thousands of part names across multiple languages. A "Wasserpumpe" in German is a "pompe à eau" in French. Pure keyword matching breaks on the first synonym it has not seen.

The solution requires a hybrid architecture: deterministic rules where they work, keyword matching for known vocabulary and LLM reasoning as a calibrated fallback - each layer with explicit confidence scoring and provenance tracking.

Multi-Stage Pipeline

An end-to-end claims processing pipeline moves raw documents through five stages, each with quality gates and audit logging.

Stage 1
Ingestion
Stage 2
Classification
Stage 3
Extraction
Stage 4
Screening
Stage 5
QA Review

Coverage Analysis: Multi-Tier Matching

Each line item goes through a multi-tier matching pipeline, prioritizing deterministic approaches before falling back to AI:

Deterministic RulesHighest confidence

Exact matches for known items, exclusions, and consumables. Zero ambiguity, instant resolution.

Keyword MatchingHigh confidence

Maps multilingual repair terms to component categories with synonym normalization.

AI FallbackModerate confidence

LLM with structured prompts for genuinely ambiguous items that require contextual reasoning.

Deployment and Iteration

Days 1-3

Baseline and Architecture

The initial LLM-only approach produced plausible-sounding but unreliable results: hallucinated part names, invented coverage rules, inconsistent calculations.

Key insight: LLMs are good at language understanding but bad at deterministic business rules. The architecture had to separate judgment from precision.
Days 3-576%

Screening + Coverage Pipeline

The deterministic screening pipeline and multi-tier coverage matching drove the biggest single improvement. Rule-based checks alone caught most denials correctly.

Days 5-888% → 94% → 98%

Iteration and Refinement

60+ evaluation iterations solving increasingly subtle problems: substring matching bugs, labor demotion logic, and part-number normalization across OCR outputs.

Holdout Test77%

Unseen Data Validation

30 previously unseen claims revealed failure modes the development set did not cover: causal exclusion clauses, missing document scenarios and sub-component interpretation gaps.

The holdout gap (98% vs 77%) is the honest measure of generalization.

Measured Outcomes

Initial Evaluation Set

Decision accuracy98%
Approved claims correctly identified100%
Denied claims correctly identified96%
False reject rate0%
False approve rate4%

Holdout Set

Decision accuracy77%
Approved claims correctly identified73%
Denied claims correctly identified80%
False reject rate26.7%
False approve rate20.0%
Multiple document types
Multi-brand support
Deterministic screening
Multi-tier matching
Comprehensive keywords
Full field extraction
3
Languages supported
Extensive test coverage

Edge Cases and Operational Insights

Holdout Failure Analysis

Causal exclusion clauses: A turbocharger is covered, but if the turbo failed because the engine (not covered) failed first, the policy denies coverage. The system does not reason about failure chains.
Missing documents: One claim was denied because the policy premium was not paid. This information lives in the insurer's internal system, not in any processable document.
Sub-component interpretation gaps: A DPF sensor is functionally part of the exhaust system. The policy covers exhausts but only lists specific sub-components. The adjuster approved by interpreting the sensor as an exhaust sub-component.
Excluded-component dominance: A claim for a hose repair (excluded) was approved because a minor bolt (CHF 3.70, covered) was also on the invoice.

Key Lessons

1

Deterministic checks beat LLM judgment for structured rules.

76% of claim denials are "part not covered" - a lookup, not a judgment call. The screening pipeline handles these with 100% confidence and zero LLM cost.

2

The accuracy curve has diminishing returns.

76% accuracy took 3 days. 76% to 98% took 4 more days. The holdout gap (98% vs 77%) confirmed that real-world generalization requires significantly more data.

3

The payout formula is harder than the AI.

Getting the financial calculation to match the insurer's exact formula requires sitting with an adjuster and walking through calculations by hand. No amount of prompt engineering solves a business logic bug.

4

Provenance tracking is non-negotiable.

In a regulated industry, every extracted value and coverage decision must trace back to a source document. Fundamental to trust and compliance.

Production Readiness

  1. Expanded on ground truth dataset
  2. Integrated with insurer's policy management API for real-time status checks
  3. Implemented causal exclusion reasoning for cascading-damage denials
  4. Deployed production triage: auto-reject high-confidence denials, auto-approve simple approvals, route edge cases for human review

The goal is not to replace adjusters but to give them back their time on the 76% of denials that are straightforward lookups, so they can focus their expertise on the 24% that actually require judgment.

Technical Considerations

Multi-Language Processing

Language-aware classification and extraction across German, French and Italian. Comparable approval rates with no measurable language bias.

Auditability and Compliance

Full provenance chain: every extracted field links to its source document and page. Every coverage decision includes match method, confidence score and explanation.

Payout Formula Precision

Order-of-operations errors in financial calculations can cause significant overpayment. Finding them requires walking through calculations claim by claim with adjusters.

Cost Control

Deterministic rules and keyword matching handle the majority of items with zero LLM cost. Token usage monitoring with configurable thresholds prevents runaway costs.

The business logic matters more than the AI. Matching the insurer's payout calculations requires working closely with adjusters and walking through real claims to ensure the logic reflects how decisions are actually made.
Lead Engineer
True Aim

Deep Detail

Ready to Achieve Similar Results?

Start with a pilot on your closed claims and see the impact for yourself.

Book a Demo