Not a leaderboard. Just the ones we actually ship.
Text classification / extraction
ModernBERT-large (English) and Albertina (Portuguese). Small, fast, fine-tune in a single afternoon. Dominant for closed-label tasks.
General-purpose Portuguese chat
Llama 3.1 8B + our SFT on 30k Portuguese instructions. Fast enough for real-time, good enough for 90% of requests from Portuguese public bodies.
Reasoning-heavy tasks
DeepSeek-R1-Distill 14B. Punches above its weight on legal reasoning once grounded with RAG.
Code-adjacent work
Qwen 2.5 Coder 7B. We use it for extraction and transformation of structured code and log data — not for the IDE.
Vision-language
Qwen2-VL 7B. Document layout understanding at a cost that fits our budget.
Where we still call an API
Novel reasoning tasks with no eval. We keep a frontier model subscription for the cold-start week.