Baselight

Laion-Pop Image Classification Dataset

Accurately Predicting and Classifying Images with Alt Texts and NSFW Predictions

@kaggle.thedevastator_laion_pop_image_classification_dataset

Loading...
Loading...

About this Dataset

Laion-Pop Image Classification Dataset


Laion-Pop Image Classification Dataset

Accurately Predicting and Classifying Images with Alt Texts and NSFW Predictions

By Huggingface Hub [source]


About this dataset

This dataset provides a collection of images with accompanying alternative texts and Nsfw prediction labels for the purpose of allowing accurate classification and prediction. Each image contains enough data points such as Sha-256 hash, URL, automatically generated caption, predicted NSFW label, alternative text similarity score, dimensions, and EXIF data to provide comprehensive details that can be utilized for a variety of image classification tasks. This dataset serves as an ideal resource for any project or endeavor that relies on accurately classifying and detecting images

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides image data with alt texts and NSFW predictions for the purpose of accurately classifying images. To use this dataset, first take a look at the columns provided and familiarize yourself with their contents. Key is a unique identifier for each image, sha256 provides the SHA-256 hash of the image, url provides a link to where the image can be accessed online, llava_caption is an automatically generated caption for each image based on its contents.

NSFW prediction is used to signal whether or not content in each photo may introduce unpleasant topics like violence or mature content such as nudity that would make it unsuitable for certain audiences while alt_txt contains alternative text associated with photos. Alt_txt similarity describes how closely related alternative text provided by users is to automatically generated captions from Laion-Pop. Height and original height provide information about how tall each image file is since some formats have different heights than others. Lastly, exif stands for exchangeable image file format which contains metadata attached to pictures by digital camera manufacturers.

With this information in mind you will be able to explore and examine your data efficiently in order to classify images according your own specifications!

Research Ideas

  • The dataset can be used for image recognition and classification, by running machine learning algorithms to build models that will predict the class of the images based on their alt texts and NSFW predictions.
  • This dataset allows developers to create tools for filtering out NSFW images from content being produced by users.
  • This dataset can also be used for creating AI-assisted applications that enrich user's images with captions related to what they are seeing in the picture, providing a more immersive experience

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv

Column name Description
key Unique identifier for each image. (String)
sha256 SHA-256 hash of the image. (String)
url URL of the image. (String)
llava_caption Automatically generated caption for the image. (String)
nsfw_prediction Prediction of whether the image is NSFW or not. (Boolean)
alt_txt Alternative text for the image. (String)
alt_txt_similarity Similarity score between the automatically generated caption and the alternative text. (Float)
height Height of the image. (Integer)
original_height Original height of the image. (Integer)
exif EXIF data of the image. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Train

@kaggle.thedevastator_laion_pop_image_classification_dataset.train
  • 284.68 MB
  • 591882 rows
  • 13 columns
Loading...

CREATE TABLE train (
  "index" BIGINT,
  "key" BIGINT,
  "sha256" VARCHAR,
  "url" VARCHAR,
  "llava_caption" VARCHAR,
  "nsfw_prediction" DOUBLE,
  "alt_txt" VARCHAR,
  "alt_txt_similarity" DOUBLE,
  "width" DOUBLE,
  "height" DOUBLE,
  "original_width" DOUBLE,
  "original_height" DOUBLE,
  "exif" VARCHAR
);

Share link

Anyone who has the link will be able to view this.