Leaderboard

Please check the Getting Started page for instructions on how to make a submission to this leaderboard.

Global

Only submissions evaluated on at least 5 core tasks (Dialogue, Summarization, Intent Detection, Stance Classification and Safety Detection) are shown in the Global Leaderboard.

CRoW Score [-MT] = Average Score across all except Machine Translation tasks

SA = Situational Accuracy

Contributor	Model	Model Size	CRoW Score [-MT] (macro-F1)	CRoW Score [-MT] (SA)	CRoW Score (macro-F1)	CRoW Score (SA)

By Task

Open-domain Dialogue

Contributor	Model	Model Size	Date	Macro-F1	SA

Dialogue Summarization

Contributor	Model	Model Size	Date	Macro-F1	SA

Intent Detection

Contributor	Model	Model Size	Date	Macro-F1	SA

Safety Detection

Contributor	Model	Model Size	Date	Macro-F1	SA

Stance Classification

Contributor	Model	Model Size	Date	Macro-F1	SA

Machine Translation

Only submissions evaluated on all MT tasks (zh-en, en-de, en-fr, en-ru) are shown in this leaderboard.

Contributor	Model	Model Size	Macro-F1	SA

Machine Translation (zh-en)

Contributor	Model	Model Size	Date	Macro-F1	SA

Machine Translation (en-de)

Contributor	Model	Model Size	Date	Macro-F1	SA

Machine Translation (en-fr)

Contributor	Model	Model Size	Date	Macro-F1	SA

Machine Translation (en-ru)

Contributor	Model	Model Size	Date	Macro-F1	SA