Not a leaderboard. Just the ones we actually ship.

Text classification / extraction

ModernBERT-large (English) and Albertina (Portuguese). Small, fast, fine-tune in a single afternoon. Dominant for closed-label tasks.

General-purpose Portuguese chat

Llama 3.1 8B + our SFT on 30k Portuguese instructions. Fast enough for real-time, good enough for 90% of requests from Portuguese public bodies.

Reasoning-heavy tasks

DeepSeek-R1-Distill 14B. Punches above its weight on legal reasoning once grounded with RAG.

Code-adjacent work

Qwen 2.5 Coder 7B. We use it for extraction and transformation of structured code and log data — not for the IDE.

Vision-language

Qwen2-VL 7B. Document layout understanding at a cost that fits our budget.

Where we still call an API

Novel reasoning tasks with no eval. We keep a frontier model subscription for the cold-start week.