Baselight

AslgPc12 (English-ASL Gloss Parallel Corpus 2012)

Synthetic English-ASL Gloss Parallel Corpus 2012

@kaggle.thedevastator_unlocking_the_power_of_cross_cultural_language_i

Loading...
Loading...

About this Dataset

AslgPc12 (English-ASL Gloss Parallel Corpus 2012)


AslgPc12 (English-ASL Gloss Parallel Corpus 2012)

Synthetic English-ASL Gloss Parallel Corpus 2012

By Huggingface Hub [source]


About this dataset

This dataset provides an exciting opportunity to bridge the cultural divide between English and American Sign Language, by unlocking a powerful synthetic English-ASL gloss parallel corpus that was generated in 2012. By exploring this cross-cultural language interoperability, we can become better connected both within and beyond our linguistic communities and bring together aspects of communication often seen as separated. With the data provided in this dataset, which consists of columns for gloss (a representation of a sign in English) and text (the translated text of the sign), researchers can uncover further insights into bridging linguistic divides with innovative approaches to machine translation models

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

The data set consists of two columns: gloss and text. The “gloss” column contains English representations of an ASL sign, helping users better understand the correlation between written English and ASL signs. The “text” column provides a written translation or interpretation in English for each corresponding ASL sign within the gloss column.

Using this data set, users can create a variety of scenarios which emulate common conversation topics that are found within everyday life - such as greetings, family activities, home chores etc, by pairing up individual words with their translations into ASL signs. With diligent practice users will gain proficiency over time when it comes to having coherent conversations using both spoken languages and signed languages such as those found in American Sign Language (ASL). Furthermore further exploration using predictive models developed from this corpus could help unravel complex linguistic problems abound cross-cultural communication barriers

Research Ideas

  • Developing generative ASL-English bilingual chat bots
  • Benchmarking different translation models to measure accuracy
  • Using the parallel data to assess various translation techniques and determine which is the most successful technique in translating from English to ASL

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv

Column name Description
gloss This column contains the ASL gloss representation in a given context for any keyword or phrase spoken in ASL. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Train

@kaggle.thedevastator_unlocking_the_power_of_cross_cultural_language_i.train
  • 6.85 MB
  • 87710 rows
  • 2 columns
Loading...

CREATE TABLE train (
  "gloss" VARCHAR,
  "text" VARCHAR
);

Share link

Anyone who has the link will be able to view this.