Turning messy HVAC supplier price books into clean, importable costbook data using a Bronze/Silver/Gold medallion architecture. Ingests Excel and PDF, normalizes with LLM-powered transformation, enriches with AHRI efficiency ratings, and exports import-ready files. Deployed on AWS EKS.
Context
FieldPulse serves HVAC contractors who install, repair, and maintain heating and cooling systems. Before a technician can price a job, they need to know what the parts cost. HVAC suppliers publish price books for this purpose, but every supplier does it differently -- some deliver Excel files with deeply nested columns, others send PDFs with embedded tables. Terminology varies across manufacturers, equipment ratings reference AHRI certification numbers that require separate lookups, and nothing maps cleanly to anything else. FieldPulse's Costbook feature lets contractors store and query these costs when building quotes, but the gap was getting from raw supplier files to data that could actually live in the system. Before this pipeline, it was a manual re-entry process.
Problem
This was not a simple format conversion. The challenge broke into three distinct dimensions that each required a different kind of solution.
Input diversity
Supplier price books arrive as Excel files with irregular layouts, PDFs with embedded tables, or both. No standard format exists. The pipeline needed to handle genuine structural messiness without human cleanup before ingestion.
Domain complexity
HVAC systems are component-based -- a complete system is an outdoor condensing unit paired with an indoor air handler, a furnace, a thermostat, and accessories. Price books list these individually. The pipeline needed to understand which components form a matched system, not just extract flat line items.
Efficiency enrichment
Many components carry AHRI certification numbers for independently tested efficiency ratings (SEER, EER, HSPF). These ratings were not in the supplier files at all -- the pipeline needed to look them up externally and attach them to produce complete Costbook entries.
Approach
The pipeline follows a Bronze, Silver, Gold medallion pattern where data fidelity and structure increase at each stage. Rather than building rigid schema parsers that would break on every new supplier format, the system uses Claude Sonnet as the transformation engine -- handling semantic variation across column names, units, and grouping conventions without per-supplier custom code.
Bronze Layer -- Raw Ingestion
Docling extracts embedded tables from PDFs into structured JSON. Pandas absorbs irregular Excel layouts -- merged cells, multi-row headers, blank separator rows -- into flat representations. The bronze output is a faithful replica of the original. Nothing is renamed, inferred, or dropped.
Silver Layer -- LLM Transform + System Assembly + AHRI Lookup
Claude Sonnet handles semantic normalization, mapping varied supplier column names to FieldPulse's internal taxonomy. System assembly groups individual components into matched HVAC systems. Firecrawl queries the AHRI Directory to pull certified efficiency ratings for each model number.
Gold Layer -- Validated Export
Generates import-ready Excel files built to FieldPulse's Costbook specification. Once exported, contractors import directly into the platform to reference accurate, supplier-current pricing when building quotes and estimates.
Architecture
Each layer operates independently and is queryable on its own. The principle: never lose the raw input, make transformation explicit and auditable, and validate against the import schema before anything reaches production.
Bronze → Silver → Gold medallion architecture
Outcome
Excel + PDF
Input Formats Handled
3
Pipeline Stages (B/S/G)
AHRI
Automated Efficiency Lookup
EKS
Production Deployment
Contractors onboarding with multiple supplier relationships no longer needed someone to re-key price book data by hand. They imported the supplier file, the pipeline ran, and the Costbook was populated with normalized, enriched, import-ready data.
Next Project
Customer Onboarding App →