“Does it work for Portuguese?” is the question we get most often. Here are twelve benchmarks worth your time.
General language
- ASSIN 2 — semantic textual similarity and entailment.
- PTBR Hate — hate-speech classification for Brazilian Portuguese.
- BR-Wikicorpora — long-text QA.
Legal
- LEX-PT — four tasks over Portuguese legislation and case law.
- STJ Citation — Supreme Court citation prediction.
Clinical
- Port-Disease — disease mention extraction from Portuguese EHR.
- ClinPT-Bench — medication, dosage, and timeline extraction.
Public administration
- DRE-QA — question answering over the Diário da República.
- Public-Services-NER — entity extraction over municipal documents.
Multimodal
- PortVQA — Portuguese visual QA.
- OCR-PT — handwritten historical Portuguese documents.
Reasoning
- Exame-PT — Brazilian national exam reasoning benchmark.
Corrections and additions welcome — this list is a living document.