Laion-Pop Image Classification Dataset
Accurately Predicting and Classifying Images with Alt Texts and NSFW Predictions
By Huggingface Hub [source]
About this dataset
This dataset provides a collection of images with accompanying alternative texts and Nsfw prediction labels for the purpose of allowing accurate classification and prediction. Each image contains enough data points such as Sha-256 hash, URL, automatically generated caption, predicted NSFW label, alternative text similarity score, dimensions, and EXIF data to provide comprehensive details that can be utilized for a variety of image classification tasks. This dataset serves as an ideal resource for any project or endeavor that relies on accurately classifying and detecting images
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
This dataset provides image data with alt texts and NSFW predictions for the purpose of accurately classifying images. To use this dataset, first take a look at the columns provided and familiarize yourself with their contents. Key is a unique identifier for each image, sha256 provides the SHA-256 hash of the image, url provides a link to where the image can be accessed online, llava_caption is an automatically generated caption for each image based on its contents.
NSFW prediction is used to signal whether or not content in each photo may introduce unpleasant topics like violence or mature content such as nudity that would make it unsuitable for certain audiences while alt_txt contains alternative text associated with photos. Alt_txt similarity describes how closely related alternative text provided by users is to automatically generated captions from Laion-Pop. Height and original height provide information about how tall each image file is since some formats have different heights than others. Lastly, exif stands for exchangeable image file format which contains metadata attached to pictures by digital camera manufacturers.
With this information in mind you will be able to explore and examine your data efficiently in order to classify images according your own specifications!
Research Ideas
- The dataset can be used for image recognition and classification, by running machine learning algorithms to build models that will predict the class of the images based on their alt texts and NSFW predictions.
- This dataset allows developers to create tools for filtering out NSFW images from content being produced by users.
- This dataset can also be used for creating AI-assisted applications that enrich user's images with captions related to what they are seeing in the picture, providing a more immersive experience
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: train.csv
Column name |
Description |
key |
Unique identifier for each image. (String) |
sha256 |
SHA-256 hash of the image. (String) |
url |
URL of the image. (String) |
llava_caption |
Automatically generated caption for the image. (String) |
nsfw_prediction |
Prediction of whether the image is NSFW or not. (Boolean) |
alt_txt |
Alternative text for the image. (String) |
alt_txt_similarity |
Similarity score between the automatically generated caption and the alternative text. (Float) |
height |
Height of the image. (Integer) |
original_height |
Original height of the image. (Integer) |
exif |
EXIF data of the image. (String) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.