Baselight

High-Quality Multilingual Translation Data

13 Languages for Machine Learning

@kaggle.thedevastator_high_quality_multilingual_translation_data

Loading...
Loading...

About this Dataset

High-Quality Multilingual Translation Data


High-Quality Multilingual Translation Data

13 Languages for Machine Learning

By Huggingface Hub [source]


About this dataset

This extensive collection of multilingual translation data provides an invaluable resource for the furtherance of machine learning research. With language pairs spanning both English and non-English languages, this dataset delivers a comprehensive selection of high-quality text translations with thousands of records per language pair. Each file within the folder for a given language pair contains two distinct columns – id and translation – providing identification numbers associated with each translation record as well as the corresponding translation text itself. This highly structured data set is sure to be an invaluable asset in the pursuit of advanced machine learning techniques!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use this Dataset

This multilingual translation data can be used for a variety of tasks, from training machine learning models to understanding language nuances. Below are some steps to get you started using this dataset:

  • Select the language pair that you would like to work with (ex. English-Spanish). This selection can be found in the filename (eg. en-es_train).
  • Download and extract the file containing your selected language pair from this Kaggle dataset. You will find two files for both training and testing within this folder - Training_File and Test_File.
  • Open your chosen file in a spreadsheet program such as Microsoft Excel or Google Sheets, so that you may explore the contents of the dataset. You will find two columns present: id (unique identifier for each translation pair) and translation which contains information about translations from either English or a non-English language depending on which file you are accessing (training vs test).
  • With these files you may then generate machine learning models in order apply natural language processing techniques, or simply explore transnational correlations between languages amongst other interesting research applications!

Research Ideas

  • Developing machine translation models: This dataset can be used to train and evaluate a variety of different machine translation models. The data could be used to optimize existing algorithms as well as train entirely new models tailored specifically for multilingual applications.
  • Improving natural language understanding: This corpus could be used to help build better artificial intelligence systems with an enhanced ability to process natural language inputs, thus allowing them to rapidly translate and respond accurately in multiple languages.
  • Translating web content dynamically: This dataset can be leveraged by web developers who want their websites and applications to automatically detect a visitor's language and generate translations instantly in the correct language pair format. The rapid response time would eliminate the need for cumbersome inter-language switching among users

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: ca-en_train.csv

Column name Description
translation Contains both English and non-English translations side by side. (String)

File: en-fi_train.csv

Column name Description
translation Contains both English and non-English translations side by side. (String)

File: en-es_train.csv

Column name Description
translation Contains both English and non-English translations side by side. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Ca De Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.ca_de_train
  • 632.56 KB
  • 4445 rows
  • 2 columns
Loading...

CREATE TABLE ca_de_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Ca En Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.ca_en_train
  • 601.01 KB
  • 4605 rows
  • 2 columns
Loading...

CREATE TABLE ca_en_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Ca Hu Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.ca_hu_train
  • 629.31 KB
  • 4463 rows
  • 2 columns
Loading...

CREATE TABLE ca_hu_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Ca Nl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.ca_nl_train
  • 615.74 KB
  • 4329 rows
  • 2 columns
Loading...

CREATE TABLE ca_nl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

De En Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.de_en_train
  • 8.84 MB
  • 51467 rows
  • 2 columns
Loading...

CREATE TABLE de_en_train (
  "id" BIGINT,
  "translation" VARCHAR
);

De Eo Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.de_eo_train
  • 265.26 KB
  • 1363 rows
  • 2 columns
Loading...

CREATE TABLE de_eo_train (
  "id" BIGINT,
  "translation" VARCHAR
);

De Es Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.de_es_train
  • 4.9 MB
  • 27526 rows
  • 2 columns
Loading...

CREATE TABLE de_es_train (
  "id" BIGINT,
  "translation" VARCHAR
);

De Fr Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.de_fr_train
  • 6.2 MB
  • 34916 rows
  • 2 columns
Loading...

CREATE TABLE de_fr_train (
  "id" BIGINT,
  "translation" VARCHAR
);

De Hu Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.de_hu_train
  • 8.89 MB
  • 51780 rows
  • 2 columns
Loading...

CREATE TABLE de_hu_train (
  "id" BIGINT,
  "translation" VARCHAR
);

De It Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.de_it_train
  • 4.95 MB
  • 27381 rows
  • 2 columns
Loading...

CREATE TABLE de_it_train (
  "id" BIGINT,
  "translation" VARCHAR
);

De Nl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.de_nl_train
  • 2.28 MB
  • 15622 rows
  • 2 columns
Loading...

CREATE TABLE de_nl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

De Pt Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.de_pt_train
  • 204.4 KB
  • 1102 rows
  • 2 columns
Loading...

CREATE TABLE de_pt_train (
  "id" BIGINT,
  "translation" VARCHAR
);

De Ru Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.de_ru_train
  • 3.31 MB
  • 17373 rows
  • 2 columns
Loading...

CREATE TABLE de_ru_train (
  "id" BIGINT,
  "translation" VARCHAR
);

El En Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.el_en_train
  • 329.76 KB
  • 1285 rows
  • 2 columns
Loading...

CREATE TABLE el_en_train (
  "id" BIGINT,
  "translation" VARCHAR
);

El Es Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.el_es_train
  • 316.25 KB
  • 1096 rows
  • 2 columns
Loading...

CREATE TABLE el_es_train (
  "id" BIGINT,
  "translation" VARCHAR
);

El Fr Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.el_fr_train
  • 320.54 KB
  • 1237 rows
  • 2 columns
Loading...

CREATE TABLE el_fr_train (
  "id" BIGINT,
  "translation" VARCHAR
);

El Hu Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.el_hu_train
  • 331.2 KB
  • 1090 rows
  • 2 columns
Loading...

CREATE TABLE el_hu_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Eo Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_eo_train
  • 257.08 KB
  • 1562 rows
  • 2 columns
Loading...

CREATE TABLE en_eo_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Es Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_es_train
  • 16.15 MB
  • 93470 rows
  • 2 columns
Loading...

CREATE TABLE en_es_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Fi Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_fi_train
  • 486.02 KB
  • 3645 rows
  • 2 columns
Loading...

CREATE TABLE en_fi_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Fr Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_fr_train
  • 20.91 MB
  • 127085 rows
  • 2 columns
Loading...

CREATE TABLE en_fr_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Hu Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_hu_train
  • 23.29 MB
  • 137151 rows
  • 2 columns
Loading...

CREATE TABLE en_hu_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En It Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_it_train
  • 5.76 MB
  • 32332 rows
  • 2 columns
Loading...

CREATE TABLE en_it_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Nl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_nl_train
  • 6.49 MB
  • 38652 rows
  • 2 columns
Loading...

CREATE TABLE en_nl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En No Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_no_train
  • 443.09 KB
  • 3499 rows
  • 2 columns
Loading...

CREATE TABLE en_no_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Pl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_pl_train
  • 406.13 KB
  • 2831 rows
  • 2 columns
Loading...

CREATE TABLE en_pl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Pt Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_pt_train
  • 199.36 KB
  • 1404 rows
  • 2 columns
Loading...

CREATE TABLE en_pt_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Ru Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_ru_train
  • 2.96 MB
  • 17496 rows
  • 2 columns
Loading...

CREATE TABLE en_ru_train (
  "id" BIGINT,
  "translation" VARCHAR
);

En Sv Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.en_sv_train
  • 530.42 KB
  • 3095 rows
  • 2 columns
Loading...

CREATE TABLE en_sv_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Eo Es Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.eo_es_train
  • 273.75 KB
  • 1677 rows
  • 2 columns
Loading...

CREATE TABLE eo_es_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Eo Fr Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.eo_fr_train
  • 271.21 KB
  • 1588 rows
  • 2 columns
Loading...

CREATE TABLE eo_fr_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Eo Hu Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.eo_hu_train
  • 267.7 KB
  • 1636 rows
  • 2 columns
Loading...

CREATE TABLE eo_hu_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Eo It Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.eo_it_train
  • 259.73 KB
  • 1453 rows
  • 2 columns
Loading...

CREATE TABLE eo_it_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Eo Pt Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.eo_pt_train
  • 204.24 KB
  • 1259 rows
  • 2 columns
Loading...

CREATE TABLE eo_pt_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Es Fi Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.es_fi_train
  • 485.6 KB
  • 3344 rows
  • 2 columns
Loading...

CREATE TABLE es_fi_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Es Fr Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.es_fr_train
  • 9.13 MB
  • 56319 rows
  • 2 columns
Loading...

CREATE TABLE es_fr_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Es It Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.es_it_train
  • 5 MB
  • 28868 rows
  • 2 columns
Loading...

CREATE TABLE es_it_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Es Nl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.es_nl_train
  • 5.75 MB
  • 32247 rows
  • 2 columns
Loading...

CREATE TABLE es_nl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Es No Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.es_no_train
  • 489.83 KB
  • 3585 rows
  • 2 columns
Loading...

CREATE TABLE es_no_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Es Pt Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.es_pt_train
  • 207.54 KB
  • 1327 rows
  • 2 columns
Loading...

CREATE TABLE es_pt_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Es Ru Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.es_ru_train
  • 3.05 MB
  • 16793 rows
  • 2 columns
Loading...

CREATE TABLE es_ru_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fi Fr Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fi_fr_train
  • 505.14 KB
  • 3537 rows
  • 2 columns
Loading...

CREATE TABLE fi_fr_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fi No Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fi_no_train
  • 463.96 KB
  • 3414 rows
  • 2 columns
Loading...

CREATE TABLE fi_no_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fi Pl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fi_pl_train
  • 426.79 KB
  • 2814 rows
  • 2 columns
Loading...

CREATE TABLE fi_pl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fr Hu Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fr_hu_train
  • 14.9 MB
  • 89337 rows
  • 2 columns
Loading...

CREATE TABLE fr_hu_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fr It Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fr_it_train
  • 3.07 MB
  • 14692 rows
  • 2 columns
Loading...

CREATE TABLE fr_it_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fr Nl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fr_nl_train
  • 6.58 MB
  • 40017 rows
  • 2 columns
Loading...

CREATE TABLE fr_nl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fr No Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fr_no_train
  • 460.74 KB
  • 3449 rows
  • 2 columns
Loading...

CREATE TABLE fr_no_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fr Pl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fr_pl_train
  • 424.31 KB
  • 2825 rows
  • 2 columns
Loading...

CREATE TABLE fr_pl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fr Pt Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fr_pt_train
  • 207.16 KB
  • 1263 rows
  • 2 columns
Loading...

CREATE TABLE fr_pt_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fr Ru Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fr_ru_train
  • 1.46 MB
  • 8197 rows
  • 2 columns
Loading...

CREATE TABLE fr_ru_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Fr Sv Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.fr_sv_train
  • 563.91 KB
  • 3002 rows
  • 2 columns
Loading...

CREATE TABLE fr_sv_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Hu It Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.hu_it_train
  • 5.55 MB
  • 30949 rows
  • 2 columns
Loading...

CREATE TABLE hu_it_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Hu Nl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.hu_nl_train
  • 7.04 MB
  • 43428 rows
  • 2 columns
Loading...

CREATE TABLE hu_nl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Hu No Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.hu_no_train
  • 481.27 KB
  • 3410 rows
  • 2 columns
Loading...

CREATE TABLE hu_no_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Hu Pl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.hu_pl_train
  • 441.15 KB
  • 2859 rows
  • 2 columns
Loading...

CREATE TABLE hu_pl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Hu Pt Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.hu_pt_train
  • 201.85 KB
  • 1184 rows
  • 2 columns
Loading...

CREATE TABLE hu_pt_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Hu Ru Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.hu_ru_train
  • 4.58 MB
  • 26127 rows
  • 2 columns
Loading...

CREATE TABLE hu_ru_train (
  "id" BIGINT,
  "translation" VARCHAR
);

It Nl Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.it_nl_train
  • 874.13 KB
  • 2359 rows
  • 2 columns
Loading...

CREATE TABLE it_nl_train (
  "id" BIGINT,
  "translation" VARCHAR
);

It Pt Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.it_pt_train
  • 195.98 KB
  • 1163 rows
  • 2 columns
Loading...

CREATE TABLE it_pt_train (
  "id" BIGINT,
  "translation" VARCHAR
);

It Ru Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.it_ru_train
  • 3.04 MB
  • 17906 rows
  • 2 columns
Loading...

CREATE TABLE it_ru_train (
  "id" BIGINT,
  "translation" VARCHAR
);

It Sv Train

@kaggle.thedevastator_high_quality_multilingual_translation_data.it_sv_train
  • 548 KB
  • 2998 rows
  • 2 columns
Loading...

CREATE TABLE it_sv_train (
  "id" BIGINT,
  "translation" VARCHAR
);

Share link

Anyone who has the link will be able to view this.