Baselight

NASA Astronomy Picture Of The Day (APOD) Extended

Enhanced data about APOD daily publications (June 1996 - Dec. 2023)

@kaggle.thomasanquetil_nasa_astronomy_picture_of_the_day_apod_extended

About this Dataset

NASA Astronomy Picture Of The Day (APOD) Extended

Every day since June 1995, on its Astronomy Picture of the Day website, NASA displays a different image of our universe, with a brief explanation written by a professional astronomer.

The metadata associated to each image can be accessed via the dedicated API made available on { NASA Open APIs } portal.

This dataset has been built based on these data for training purpose. Raw data recovered from the API have been enhanced with informations and caracteristics about the images (using PIL) and with named entities and keywords extraction on the explanation text (using spaCy)

The code I used to create this dataset is available here.

File and Data Field Descriptions

⚠ Please note that no images are included, only links to the images are provided in the dataset as a feature.

The dataset consists of one CSV file (nasa-apod-dataset.csv) with 15 variables separed with semicolon. It contains 10K+ records from June 1995 to end of December 2023.

  • date : The date of the APOD image.
  • title : The title of the APOD image.
  • copyright : The copyright of the image. If copyright is blank, the image is public domain.
  • explanation : The explanation of the image wittren by a professional astronomer.
  • keywords : A list-like of the 20 top first keywords separated with a comma and extracted from explanation.
  • named_entities : A list-like of the named entities separated with a comma and extracted from explanation.
  • media_type : The type of media. Most of the time image but can sometimes be video or other.
  • media_url : HD URL of the media. If there is no HD version, the SD URL is used.
  • img_format : The format of the image (GIF, JPEG, etc.).
  • img_mode : The type and depth of a pixel in the image.
  • img_width_px : The width of the image in pixel.
  • img_height_px : The height of the image in pixel.
  • camera_make : The brand of the camera used to take the image.
  • camera_model : The model of the camera used to take the image.
  • software : The software, and sometimes OS, used to save/edit the image.