Overview
Diet Decoder is a nutrition lookup tool that understands how people actually describe food. Type 'the leftover pasta thing I had for lunch' and it figures out what you mean. Under the hood it runs a 5-stage confidence-gated pipeline: alias matching, fuzzy search, embedding similarity, an LLM decomposer for complex dishes, and a USDA fallback if nothing else lands with enough confidence. Stack is FastAPI, React, Supabase for caching, HuggingFace for embeddings, and Groq running Llama 3.3 70B for the decomposer stage.
The Problem
Every nutrition tracker I tried assumed you already knew the official USDA name for whatever you ate. Type 'chicken with vegetables' and you'd get nothing, or worse, a random match. The tools forced people to learn their database instead of the other way around. Nobody was fixing this, so I did.
My Approach
I built a confidence-gated pipeline where each stage only passes the query forward if it's not confident enough to stop. Simple alias lookups run first because they're fast and free. Fuzzy matching handles typos. Embedding similarity catches paraphrased descriptions. If a dish is too complex for any of that, Groq's Llama 3.3 70B decomposes it into individual ingredients and the pipeline runs again on each one. USDA is the safety net at the end. The result is a system that degrades gracefully instead of just failing.
Technical Implementation
FastAPI handles the backend. Each query hits the pipeline stages in order, with confidence thresholds gating whether we move to the next stage or return early. Supabase caches results so repeated queries don't re-run inference. HuggingFace sentence-transformers generate the embeddings for the similarity stage. The LLM decomposer uses Groq with Llama 3.3 70B because the latency is low enough that it doesn't blow up the response time even as a fallback. I also built out BMR profile support so users get personalized context on their intake, not just raw numbers.
Results
The pipeline handles natural language descriptions that would completely break a keyword search. Dishes like 'the burrito bowl thing I had' resolve correctly because the decomposer stage pulls them apart. Supabase caching means common queries are basically instant. BMR profiles make the output actually useful for people tracking goals, not just looking up data.
What I Learned
The confidence-gating approach was the key insight. A lot of NLP pipelines try to do everything in one shot with one model. Building stages that know when to give up and escalate meant each stage could stay simple and fast. Also: Groq is genuinely fast enough for production use as a decomposer fallback, which I wasn't sure about going in.