AslgPc12 (English-ASL Gloss Parallel Corpus 2012)
Synthetic English-ASL Gloss Parallel Corpus 2012
By Huggingface Hub [source]
About this dataset
This dataset provides an exciting opportunity to bridge the cultural divide between English and American Sign Language, by unlocking a powerful synthetic English-ASL gloss parallel corpus that was generated in 2012. By exploring this cross-cultural language interoperability, we can become better connected both within and beyond our linguistic communities and bring together aspects of communication often seen as separated. With the data provided in this dataset, which consists of columns for gloss (a representation of a sign in English) and text (the translated text of the sign), researchers can uncover further insights into bridging linguistic divides with innovative approaches to machine translation models
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
The data set consists of two columns: gloss and text. The “gloss” column contains English representations of an ASL sign, helping users better understand the correlation between written English and ASL signs. The “text” column provides a written translation or interpretation in English for each corresponding ASL sign within the gloss column.
Using this data set, users can create a variety of scenarios which emulate common conversation topics that are found within everyday life - such as greetings, family activities, home chores etc, by pairing up individual words with their translations into ASL signs. With diligent practice users will gain proficiency over time when it comes to having coherent conversations using both spoken languages and signed languages such as those found in American Sign Language (ASL). Furthermore further exploration using predictive models developed from this corpus could help unravel complex linguistic problems abound cross-cultural communication barriers
Research Ideas
- Developing generative ASL-English bilingual chat bots
- Benchmarking different translation models to measure accuracy
- Using the parallel data to assess various translation techniques and determine which is the most successful technique in translating from English to ASL
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: train.csv
Column name |
Description |
gloss |
This column contains the ASL gloss representation in a given context for any keyword or phrase spoken in ASL. (String) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.