NLP on UFO-Sighting reports
Dataset Description
Context
In NLP sometimes you want data about people talking about the same topic, the topic is very unique and distinguishable, the English used is clean (slang free) and schema-less (unstructured grammatically). I believe that UFO sighting descriptions by people cover all of these points.
Content
I found that the data posted on Kaggle for UFO sighting [1] all focused on geospatial patterns and only includes the first 300 characters or so from every description report. Thus I scraped all 90K reports from their source [2] with the full description of every UFO sighting for NLP.
Acknowledgements
I thank the National UFO Reporting Center (NUFORC) for providing the data and making it easy to scrape too.
Inspiration
I am very curious to see different NLP techniques done on this data, especially unsupervised with BERT embeddings. Also if merged with the one posted by Kaggle [1], it could be used for the classification of UFO sighted "Shape". Furthermore, a supervised experiment which predicts the location of the sighted UFO "State/City/County...etc.) on the same day to confirm that 2 different descriptions were in fact for the same UFO for instance would be very interesting.
References
[1] https://www.kaggle.com/NUFORC/ufo-sightings
[2] http://www.nuforc.org/webreports/ndxevent.html
Related Datasets
-
UFO Sightings In USA
@kaggle
-
Dummy Monster
@owid