How I Built an Intent Classifier to Route Messages Across Multiple LLMs
Most AI chat apps make a quiet assumption that costs them a lot: one model is good enough for everything. It isn't. When I started building Chymera, I wanted to fix that. The idea was simple — inst...

Source: DEV Community
Most AI chat apps make a quiet assumption that costs them a lot: one model is good enough for everything. It isn't. When I started building Chymera, I wanted to fix that. The idea was simple — instead of locking the user into a single LLM, the system should figure out what kind of question is being asked and send it to the model best suited to answer it. This is the story of how I built that routing layer, what I got wrong the first time, and what the working version actually looks like. The Problem With Single-Model Architectures Every major AI chat product — ChatGPT, Claude, Gemini — lets you switch models manually. But users don't think in terms of models. They just ask questions. The mental overhead of "hmm, should I use GPT-4o or o1 for this?" is friction that shouldn't exist. Beyond UX, there's a real capability argument. Llama 3.3 70B via Groq is exceptional at code generation, while Qwen QwQ 32B has unusually strong multi-step reasoning. Gemini 2.5 Flash is fast and has native