Baselight
Sign In
kaggle

Multiple Machine Learning Datasets

Kaggle

@kaggle.ericamohadjei_trending_public_datasets

Loading...
Loading...

Data Sets for Machine Learning & Data Science Practice

Dataset Description

Trending Public Datasets Overview

These Datasets contain a diverse collection of datasets intended for machine learning research and practice. Each dataset is curated to support different types of machine learning challenges, including classification, regression, and clustering. Below is a detailed list of the datasets available in this repository, along with descriptions and links to their sources.

Available Datasets

Iris Dataset

Description: This classic dataset includes measurements for 150 iris flowers from three different species. It includes four features: sepal length, sepal width, petal length, and petal width.
Source: Iris Dataset Source
Files: iris.csv

DHFR Dataset

Description: Contains data for 325 molecules with biological activity against the DHFR enzyme, relevant in anti-malarial drug research. It includes 228 molecular descriptors as features.
Source: DHFR Dataset Source
Files: dhfr.csv

Heart Disease Dataset (Cleveland)

Description: Comprises diagnostic measurements from 303 patients tested for heart disease at the Cleveland Clinic. It features 13 clinical attributes.
Source: UCI Machine Learning Repository
Files: heart-disease-cleveland.csv

HCV Data

Description: Detailed datasets related to Hepatitis C Virus (HCV) progression, with features for classification and regression tasks.
Files: HCV_NS5B_Curated.csv, hcv_classification.csv, hcv_regression.arff

NBA Seasons Stats

Description: Player statistics from the NBA 2020 and 2021 seasons for detailed sports analytics.
Files: NBA_2020.csv, NBA_2021.csv

Boston Housing Dataset

Description: Data concerning housing values in the suburbs of Boston, suitable for regression analysis.
Files: BostonHousing.csv, BostonHousing_train.csv, BostonHousing_test.csv

Acetylcholinesterase Inhibitor Bioactivity

Description: Chemical bioactivity data against acetylcholinesterase, a target relevant to Alzheimer's research. It includes raw and processed formats with chemical fingerprints.
Files: acetylcholinesterase_01_bioactivity_data_raw.csv to acetylcholinesterase_07_bioactivity_data_2class_pIC50_pubchem_fp.csv

California Housing Dataset

Description: Data aimed at predicting median house prices in California districts.
Files: california_housing_train.csv, california_housing_test.csv

Virtual Reality Experiences Data

Description: Data from user experiences with various virtual reality setups to study user engagement and satisfaction.
Files: Virtual Reality Experiences-data.csv

Fast-Food Chains in USA

Description: Overview of various fast-food chains operating in the USA, their locations, and popularity.
Files: Fast-Food Chains in USA.csv

Contributing
We welcome contributions to this dataset repository. If you have a dataset that you believe would be beneficial for the machine learning community, please see our contribution guidelines in CONTRIBUTING.md.

License
This dataset is available under the MIT License.


Related Datasets

Share link

Anyone who has the link will be able to view this.