download sample multilingual embeddings


Thanks for stopping by! This website is meant to help researchers evaluate multilingual word embeddings on various evaluation tasks. Please follow the following steps:
  1. Use the training data to estimate multilingual embeddings.
  2. Write the estimated embeddings to a plain-text UTF8-encoded file. Each line should begin with a lowercased surface form, prefixed by the 2-letter ISO 639-1 code of the language (e.g., "en:school" or "fr:├ęcole") followed by the floating-point values of each dimensions in the word embedding. All fields must be delimited with one space, and each line must end with a "\n" (as opposed to "\r\n".
  3. The embeddings file can be large. You can reduce the size by removing words outside this vocabulary, and/or compressing the file using gzip.
  4. Select the embeddings file using the "Choose File" button, then click "Upload vectors".
  5. After the embeddings file is uploaded, you will be redirected to another page with basic statistics about your embeddings, and a table of available evaluation tasks. By default, only the dev tasks are displayed. The dev tasks are meant to provide feedback that helps us develop better estimation methods. When it is time to report the final results in a publication, toggle the dev/test switch to see the test tasks. A description of the evaluation tasks with links to download the evaluation datasets are also available.
  6. In order to evaluate your embeddings on a particular task, click the corresponding "Evaluate" button.
  7. The evaluation scripts may take only a few seconds or more than an hour to run to completion on the server, depending on which ones you choose (classification and parsing are the slowest). The results will eventually appear in the "Scores" table at the bottom of the page (refresh the page every few minutes to see the results).