Lists Of Words In 30 European Languages
Spoken Languages Data for NLP
@kaggle.jacekpardyak_languages_of_europe
Spoken Languages Data for NLP
@kaggle.jacekpardyak_languages_of_europe
Most of the NLP material in Kaggle deals with the analysis of the English language. With these collections of words from other spoken languages, you can solve the same problems and encounter a new language-specific one.
This collection contains word lists in the following languages:
'Albanian', 'Belarusian', 'Bosnian', 'Bulgarian', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'French', 'German', 'Greek', 'Hungarian', 'Icelandic', 'Italian', 'Latvian', 'Lithuanian', 'Norwegian (Bokmål and Nynorsk)', 'Polish', 'Portuguese', 'Romanian', 'Russian', 'Serbian', 'Slovak', 'Slovenian', 'Spanish', 'Swedish', 'Turkish', 'Ukrainian'.
The separate file languages indicates the encoding of each file. I had no problems reading files in Python. In R, if base::read.csv fails for some encoding, the readr::read_csv works.
These collections are based on https://github.com/LibreOffice/dictionaries
Any form of contact with the language we learn brings us closer to our goal. Working with a language that we know helps us understand it better. Have fun!
CREATE TABLE be_by (
"n" VARCHAR -- А
);CREATE TABLE bg_bg (
"n" VARCHAR -- Абаджиев
);CREATE TABLE languages (
"unnamed_0" BIGINT -- Unnamed: 0,
"index" BIGINT,
"language" VARCHAR,
"code" VARCHAR,
"folder" VARCHAR,
"sub_folder" VARCHAR,
"encoding" VARCHAR
);CREATE TABLE ro_ro (
"a" VARCHAR,
"unnamed_1" VARCHAR -- Unnamed: 1,
"a_1" VARCHAR
);CREATE TABLE ru_ru (
"n" VARCHAR -- ЧПУ
);CREATE TABLE sq_al (
"ab" VARCHAR,
"unnamed_1" VARCHAR -- Unnamed: 1
);CREATE TABLE sr_sr (
"unnamed_0" VARCHAR -- Unnamed: 0,
"unnamed_1" VARCHAR -- Unnamed: 1
);CREATE TABLE uk_ua (
"n" VARCHAR -- А
);Anyone who has the link will be able to view this.