Context
In NLP sometimes you want data about people talking about the same topic, the topic is very unique and distinguishable, the English used is clean (slang free) and schema-less (unstructured grammatically). I believe that UFO sighting descriptions by people cover all of these points.
Content
I found that the data posted on Kaggle for UFO sighting [1] all focused on geospatial patterns and only includes the first 300 characters or so from every description report. Thus I scraped all 90K reports from their source [2] with the full description of every UFO sighting for NLP.
Acknowledgements
I thank the National UFO Reporting Center (NUFORC) for providing the data and making it easy to scrape too.
Inspiration
I am very curious to see different NLP techniques done on this data, especially unsupervised with BERT embeddings. Also if merged with the one posted by Kaggle [1], it could be used for the classification of UFO sighted "Shape". Furthermore, a supervised experiment which predicts the location of the sighted UFO "State/City/County...etc.) on the same day to confirm that 2 different descriptions were in fact for the same UFO for instance would be very interesting.
References
[1] https://www.kaggle.com/NUFORC/ufo-sightings
[2] http://www.nuforc.org/webreports/ndxevent.html