Baselight

Lists Of Words In 30 European Languages

Spoken Languages Data for NLP

@kaggle.jacekpardyak_languages_of_europe

Loading...
Loading...

About this Dataset

Lists Of Words In 30 European Languages

Context

Most of the NLP material in Kaggle deals with the analysis of the English language. With these collections of words from other spoken languages, you can solve the same problems and encounter a new language-specific one.

Content

This collection contains word lists in the following languages:

'Albanian', 'Belarusian', 'Bosnian', 'Bulgarian', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'French', 'German', 'Greek', 'Hungarian', 'Icelandic', 'Italian', 'Latvian', 'Lithuanian', 'Norwegian (Bokmål and Nynorsk)', 'Polish', 'Portuguese', 'Romanian', 'Russian', 'Serbian', 'Slovak', 'Slovenian', 'Spanish', 'Swedish', 'Turkish', 'Ukrainian'.

The separate file languages indicates the encoding of each file. I had no problems reading files in Python. In R, if base::read.csv fails for some encoding, the readr::read_csv works.

Acknowledgements

These collections are based on https://github.com/LibreOffice/dictionaries

Inspiration

Any form of contact with the language we learn brings us closer to our goal. Working with a language that we know helps us understand it better. Have fun!

Tables

Be By

@kaggle.jacekpardyak_languages_of_europe.be_by
  • 712.5 KB
  • 83434 rows
  • 1 column
Loading...

CREATE TABLE be_by (
  "n" VARCHAR
);

Bg Bg

@kaggle.jacekpardyak_languages_of_europe.bg_bg
  • 642.64 KB
  • 78237 rows
  • 1 column
Loading...

CREATE TABLE bg_bg (
  "n" VARCHAR
);

Languages

@kaggle.jacekpardyak_languages_of_europe.languages
  • 6.31 KB
  • 31 rows
  • 7 columns
Loading...

CREATE TABLE languages (
  "unnamed_0" BIGINT,
  "index" BIGINT,
  "language" VARCHAR,
  "code" VARCHAR,
  "folder" VARCHAR,
  "sub_folder" VARCHAR,
  "encoding" VARCHAR
);

Ro Ro

@kaggle.jacekpardyak_languages_of_europe.ro_ro
  • 1.19 MB
  • 180884 rows
  • 3 columns
Loading...

CREATE TABLE ro_ro (
  "a" VARCHAR,
  "unnamed_1" VARCHAR,
  "a_1" VARCHAR
);

Ru Ru

@kaggle.jacekpardyak_languages_of_europe.ru_ru
  • 1.16 MB
  • 146268 rows
  • 1 column
Loading...

CREATE TABLE ru_ru (
  "n" VARCHAR
);

Sq Al

@kaggle.jacekpardyak_languages_of_europe.sq_al
  • 1.34 MB
  • 229504 rows
  • 2 columns
Loading...

CREATE TABLE sq_al (
  "ab" VARCHAR,
  "unnamed_1" VARCHAR
);

Sr Sr

@kaggle.jacekpardyak_languages_of_europe.sr_sr
  • 1.65 MB
  • 251548 rows
  • 2 columns
Loading...

CREATE TABLE sr_sr (
  "unnamed_0" VARCHAR,
  "unnamed_1" VARCHAR
);

Uk Ua

@kaggle.jacekpardyak_languages_of_europe.uk_ua
  • 959.59 KB
  • 111401 rows
  • 1 column
Loading...

CREATE TABLE uk_ua (
  "n" VARCHAR
);

Share link

Anyone who has the link will be able to view this.