Welcome to BOUQuET πŸ’ , Benchmark and Open-initiative for Universal Quality Evaluation in Translation.

Let’s make machine translation available for any written language!

Please take part in shaping the future - your help will be greatly appreciated.

We are inviting everyone to contribute to BOUQuET πŸ’ - a project aimed at building an open source evaluation dataset for massively multilingual text-to-text machine translation systems.

You are very welcome to provide your language translation choosing the source you feel more comfortable with, including English, Egyptian Arabic, Mandarin Chinese, German, French, Hindi, Indonesian, Russian or Spanish. Please take a look at Contributor guidelines that will further inform you on how to proceed.

You can also find more details on BOUQuET πŸ’ scientific context and purpose in the BOUQuET paper. An extensive example of using it for benchmarking can be found in the Omnilingual MT paper.

Dataset

The dataset is accessible at https://huggingface.co/datasets/facebook/bouquet. We are going to update it regularly, as the contributions in new languages are completed and validated.

Leaderboard

To see how the various translation systems perform on BOUQuET, refer to the "Leaderboard" tab!

If you want another system evaluated, please open a discussion in the "Community" tab or evaluate it on your own using the code in https://github.com/facebookresearch/bouquet.

Contribute

If you want to contribute dataset translations for a new language or validate existing translations, check out our crowdsourcing system: https://bouquet.metademolab.com.

License

The dataset collected by the BOUQuET initiative and your contributions to this dataset will be released under the Creative Commons Attribution 4.0 license. Full text: https://choosealicense.com/licenses/cc-by-4.0/.

Reference