Please check the Getting Started page for instructions on how to make a submission to this leaderboard.
Global
Only submissions evaluated on at least 5 core tasks (Dialogue, Summarization, Intent Detection, Stance Classification and Safety Detection) are shown in the Global Leaderboard.
CRoW Score [-MT] = Average Score across all except Machine Translation tasks
SA = Situational Accuracy
Contributor |
Model |
Model Size |
CRoW Score [-MT] (macro-F1) |
CRoW Score [-MT] (SA) |
CRoW Score (macro-F1) |
CRoW Score (SA) |
By Task
Open-domain Dialogue
Contributor |
Model |
Model Size |
Date |
Macro-F1 |
SA |
Dialogue Summarization
Contributor |
Model |
Model Size |
Date |
Macro-F1 |
SA |
Intent Detection
Contributor |
Model |
Model Size |
Date |
Macro-F1 |
SA |
Safety Detection
Contributor |
Model |
Model Size |
Date |
Macro-F1 |
SA |
Stance Classification
Contributor |
Model |
Model Size |
Date |
Macro-F1 |
SA |
Machine Translation
Only submissions evaluated on all MT tasks (zh-en, en-de, en-fr, en-ru) are shown in this leaderboard.
Contributor |
Model |
Model Size |
Macro-F1 |
SA |
Machine Translation (zh-en)
Contributor |
Model |
Model Size |
Date |
Macro-F1 |
SA |
Machine Translation (en-de)
Contributor |
Model |
Model Size |
Date |
Macro-F1 |
SA |
Machine Translation (en-fr)
Contributor |
Model |
Model Size |
Date |
Macro-F1 |
SA |
Machine Translation (en-ru)
Contributor |
Model |
Model Size |
Date |
Macro-F1 |
SA |