Baselight

UFO Sighting Full Description

NLP on UFO-Sighting reports

@kaggle.emohamed_ufo_sighting_full_description

Loading...
Loading...

About this Dataset

UFO Sighting Full Description

Context

In NLP sometimes you want data about people talking about the same topic, the topic is very unique and distinguishable, the English used is clean (slang free) and schema-less (unstructured grammatically). I believe that UFO sighting descriptions by people cover all of these points.

Content

I found that the data posted on Kaggle for UFO sighting [1] all focused on geospatial patterns and only includes the first 300 characters or so from every description report. Thus I scraped all 90K reports from their source [2] with the full description of every UFO sighting for NLP.

Acknowledgements

I thank the National UFO Reporting Center (NUFORC) for providing the data and making it easy to scrape too.

Inspiration

I am very curious to see different NLP techniques done on this data, especially unsupervised with BERT embeddings. Also if merged with the one posted by Kaggle [1], it could be used for the classification of UFO sighted "Shape". Furthermore, a supervised experiment which predicts the location of the sighted UFO "State/City/County...etc.) on the same day to confirm that 2 different descriptions were in fact for the same UFO for instance would be very interesting.

References

[1] https://www.kaggle.com/NUFORC/ufo-sightings
[2] http://www.nuforc.org/webreports/ndxevent.html

Tables

@kaggle.emohamed_ufo_sighting_full_description.reports_links
  • 572.42 KB
  • 94566 rows
  • 2 columns
Loading...

CREATE TABLE reports_links (
  "link" VARCHAR,
  "unnamed_1" VARCHAR
);

Ufo Sightingdescription Annotated

@kaggle.emohamed_ufo_sighting_full_description.ufo_sightingdescription_annotated
  • 11.11 MB
  • 95716 rows
  • 4 columns
Loading...

CREATE TABLE ufo_sightingdescription_annotated (
  "n_0" BIGINT,
  "n__translucent_cylindrical_silent_aircraft_seen_near_a_38038068" VARCHAR,
  "aircraft" VARCHAR,
  "negative" VARCHAR
);

Share link

Anyone who has the link will be able to view this.