Name: AslgPc12 (English-ASL Gloss Parallel Corpus 2012)
Creator: Kaggle
License: https://creativecommons.org/publicdomain/zero/1.0/

Synthetic English-ASL Gloss Parallel Corpus 2012

AslgPc12 (English-ASL Gloss Parallel Corpus 2012)

Synthetic English-ASL Gloss Parallel Corpus 2012

By Huggingface Hub [source]

About this dataset

This dataset provides an exciting opportunity to bridge the cultural divide between English and American Sign Language, by unlocking a powerful synthetic English-ASL gloss parallel corpus that was generated in 2012. By exploring this cross-cultural language interoperability, we can become better connected both within and beyond our linguistic communities and bring together aspects of communication often seen as separated. With the data provided in this dataset, which consists of columns for gloss (a representation of a sign in English) and text (the translated text of the sign), researchers can uncover further insights into bridging linguistic divides with innovative approaches to machine translation models

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The data set consists of two columns: gloss and text. The “gloss” column contains English representations of an ASL sign, helping users better understand the correlation between written English and ASL signs. The “text” column provides a written translation or interpretation in English for each corresponding ASL sign within the gloss column.

Using this data set, users can create a variety of scenarios which emulate common conversation topics that are found within everyday life - such as greetings, family activities, home chores etc, by pairing up individual words with their translations into ASL signs. With diligent practice users will gain proficiency over time when it comes to having coherent conversations using both spoken languages and signed languages such as those found in American Sign Language (ASL). Furthermore further exploration using predictive models developed from this corpus could help unravel complex linguistic problems abound cross-cultural communication barriers

Research Ideas

Developing generative ASL-English bilingual chat bots

Benchmarking different translation models to measure accuracy

Using the parallel data to assess various translation techniques and determine which is the most successful technique in translating from English to ASL

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv

Column name	Description
gloss	This column contains the ASL gloss representation in a given context for any keyword or phrase spoken in ASL. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Related Datasets

ASLG-PC12 (English-ASL Gloss Parallel Corpus 2012)

@kaggle
AI Performance On Language Tasks

@owid
SFC2014 - REACT EU Overview Allocation Vs Decided

@esifunds
Ethnic Power Relations Dataset (ETH, 2021)

@owid
Wars On Territory

@owid
Energy Transitions: Global And National Perspectives - Vaclav Smil (2017)

@owid

ASLG-PC12 (English-ASL Gloss Parallel Corpus 2012)

AI Performance On Language Tasks

SFC2014 - REACT EU Overview Allocation Vs Decided

Ethnic Power Relations Dataset (ETH, 2021)

Wars On Territory

Energy Transitions: Global And National Perspectives - Vaclav Smil (2017)