Baselight

Multiple Machine Learning Datasets

Data Sets for Machine Learning & Data Science Practice

@kaggle.ericamohadjei_trending_public_datasets

Loading...
Loading...

About this Dataset

Multiple Machine Learning Datasets

Trending Public Datasets Overview

These Datasets contain a diverse collection of datasets intended for machine learning research and practice. Each dataset is curated to support different types of machine learning challenges, including classification, regression, and clustering. Below is a detailed list of the datasets available in this repository, along with descriptions and links to their sources.

Available Datasets

Iris Dataset

Description: This classic dataset includes measurements for 150 iris flowers from three different species. It includes four features: sepal length, sepal width, petal length, and petal width.
Source: Iris Dataset Source
Files: iris.csv

DHFR Dataset

Description: Contains data for 325 molecules with biological activity against the DHFR enzyme, relevant in anti-malarial drug research. It includes 228 molecular descriptors as features.
Source: DHFR Dataset Source
Files: dhfr.csv

Heart Disease Dataset (Cleveland)

Description: Comprises diagnostic measurements from 303 patients tested for heart disease at the Cleveland Clinic. It features 13 clinical attributes.
Source: UCI Machine Learning Repository
Files: heart-disease-cleveland.csv

HCV Data

Description: Detailed datasets related to Hepatitis C Virus (HCV) progression, with features for classification and regression tasks.
Files: HCV_NS5B_Curated.csv, hcv_classification.csv, hcv_regression.arff

NBA Seasons Stats

Description: Player statistics from the NBA 2020 and 2021 seasons for detailed sports analytics.
Files: NBA_2020.csv, NBA_2021.csv

Boston Housing Dataset

Description: Data concerning housing values in the suburbs of Boston, suitable for regression analysis.
Files: BostonHousing.csv, BostonHousing_train.csv, BostonHousing_test.csv

Acetylcholinesterase Inhibitor Bioactivity

Description: Chemical bioactivity data against acetylcholinesterase, a target relevant to Alzheimer's research. It includes raw and processed formats with chemical fingerprints.
Files: acetylcholinesterase_01_bioactivity_data_raw.csv to acetylcholinesterase_07_bioactivity_data_2class_pIC50_pubchem_fp.csv

California Housing Dataset

Description: Data aimed at predicting median house prices in California districts.
Files: california_housing_train.csv, california_housing_test.csv

Virtual Reality Experiences Data

Description: Data from user experiences with various virtual reality setups to study user engagement and satisfaction.
Files: Virtual Reality Experiences-data.csv

Fast-Food Chains in USA

Description: Overview of various fast-food chains operating in the USA, their locations, and popularity.
Files: Fast-Food Chains in USA.csv

Contributing
We welcome contributions to this dataset repository. If you have a dataset that you believe would be beneficial for the machine learning community, please see our contribution guidelines in CONTRIBUTING.md.

License
This dataset is available under the MIT License.

Tables

Nba 2021

@kaggle.ericamohadjei_trending_public_datasets.nba_2021
  • 59.2 kB
  • 705 rows
  • 29 columns
Loading...
CREATE TABLE nba_2021 (
  "player" VARCHAR,
  "pos" VARCHAR,
  "age" BIGINT,
  "tm" VARCHAR,
  "g" BIGINT,
  "gs" BIGINT,
  "mp" DOUBLE,
  "fg" DOUBLE,
  "fga" DOUBLE,
  "fg_51ab2d" DOUBLE  -- FG%,
  "n_3p" DOUBLE  -- 3P,
  "n_3pa" DOUBLE  -- 3PA,
  "n_3p_d87a85" DOUBLE  -- 3P%,
  "n_2p" DOUBLE  -- 2P,
  "n_2pa" DOUBLE  -- 2PA,
  "n_2p_c8658d" DOUBLE  -- 2P%,
  "efg" DOUBLE  -- EFG%,
  "ft" DOUBLE,
  "fta" DOUBLE,
  "ft_ba25d5" DOUBLE  -- FT%,
  "orb" DOUBLE,
  "drb" DOUBLE,
  "trb" DOUBLE,
  "ast" DOUBLE,
  "stl" DOUBLE,
  "blk" DOUBLE,
  "tov" DOUBLE,
  "pf" DOUBLE,
  "pts" DOUBLE
);

Nba Player Stats 2019

@kaggle.ericamohadjei_trending_public_datasets.nba_player_stats_2019
  • 58.53 kB
  • 708 rows
  • 29 columns
Loading...
CREATE TABLE nba_player_stats_2019 (
  "player" VARCHAR,
  "pos" VARCHAR,
  "age" BIGINT,
  "tm" VARCHAR,
  "g" BIGINT,
  "gs" BIGINT,
  "mp" DOUBLE,
  "fg" DOUBLE,
  "fga" DOUBLE,
  "fg_51ab2d" DOUBLE  -- FG%,
  "n_3p" DOUBLE  -- 3P,
  "n_3pa" DOUBLE  -- 3PA,
  "n_3p_d87a85" DOUBLE  -- 3P%,
  "n_2p" DOUBLE  -- 2P,
  "n_2pa" DOUBLE  -- 2PA,
  "n_2p_c8658d" DOUBLE  -- 2P%,
  "efg" DOUBLE  -- EFG%,
  "ft" DOUBLE,
  "fta" DOUBLE,
  "ft_ba25d5" DOUBLE  -- FT%,
  "orb" DOUBLE,
  "drb" DOUBLE,
  "trb" DOUBLE,
  "ast" DOUBLE,
  "stl" DOUBLE,
  "blk" DOUBLE,
  "tov" DOUBLE,
  "pf" DOUBLE,
  "pts" DOUBLE
);

Penguins Cleaned

@kaggle.ericamohadjei_trending_public_datasets.penguins_cleaned
  • 8.37 kB
  • 333 rows
  • 7 columns
Loading...
CREATE TABLE penguins_cleaned (
  "species" VARCHAR,
  "island" VARCHAR,
  "bill_length_mm" DOUBLE,
  "bill_depth_mm" DOUBLE,
  "flipper_length_mm" BIGINT,
  "body_mass_g" BIGINT,
  "sex" VARCHAR
);

Penguins Example

@kaggle.ericamohadjei_trending_public_datasets.penguins_example
  • 4.82 kB
  • 1 row
  • 6 columns
Loading...
CREATE TABLE penguins_example (
  "island" VARCHAR,
  "bill_length_mm" DOUBLE,
  "bill_depth_mm" DOUBLE,
  "flipper_length_mm" DOUBLE,
  "body_mass_g" DOUBLE,
  "sex" VARCHAR
);

Penguins Size

@kaggle.ericamohadjei_trending_public_datasets.penguins_size
  • 8.48 kB
  • 344 rows
  • 7 columns
Loading...
CREATE TABLE penguins_size (
  "species" VARCHAR,
  "island" VARCHAR,
  "culmen_length_mm" DOUBLE,
  "culmen_depth_mm" DOUBLE,
  "flipper_length_mm" DOUBLE,
  "body_mass_g" DOUBLE,
  "sex" VARCHAR
);

Stock Price

@kaggle.ericamohadjei_trending_public_datasets.stock_price
  • 2.52 kB
  • 24 rows
  • 2 columns
Loading...
CREATE TABLE stock_price (
  "date" TIMESTAMP,
  "close" DOUBLE
);

Stocks Toy

@kaggle.ericamohadjei_trending_public_datasets.stocks_toy
  • 2.75 kB
  • 4 rows
  • 3 columns
Loading...
CREATE TABLE stocks_toy (
  "company" VARCHAR,
  "q2" BIGINT,
  "q3" BIGINT
);

Titanic

@kaggle.ericamohadjei_trending_public_datasets.titanic
  • 11.93 kB
  • 891 rows
  • 10 columns
Loading...
CREATE TABLE titanic (
  "sex" VARCHAR,
  "age" DOUBLE,
  "sibsp" BIGINT,
  "parch" BIGINT,
  "fare" DOUBLE,
  "embarked" VARCHAR,
  "class" VARCHAR,
  "who" VARCHAR,
  "alone" BOOLEAN,
  "survived" BIGINT
);

Virtual Reality Experiences Data

@kaggle.ericamohadjei_trending_public_datasets.virtual_reality_experiences_data
  • 16.49 kB
  • 654 rows
  • 7 columns
Loading...
CREATE TABLE virtual_reality_experiences_data (
  "userid" BIGINT,
  "age" BIGINT,
  "gender" VARCHAR,
  "vrheadset" VARCHAR,
  "duration" DOUBLE,
  "motionsickness" BIGINT,
  "immersionlevel" BIGINT
);

Weather Nominal Weka

@kaggle.ericamohadjei_trending_public_datasets.weather_nominal_weka
  • 3.7 kB
  • 14 rows
  • 5 columns
Loading...
CREATE TABLE weather_nominal_weka (
  "outlook" VARCHAR,
  "temperature" VARCHAR,
  "humidity" VARCHAR,
  "windy" BOOLEAN,
  "play" VARCHAR
);

Weather Weka

@kaggle.ericamohadjei_trending_public_datasets.weather_weka
  • 3.95 kB
  • 14 rows
  • 5 columns
Loading...
CREATE TABLE weather_weka (
  "outlook" VARCHAR,
  "temperature" BIGINT,
  "humidity" BIGINT,
  "windy" BOOLEAN,
  "play" VARCHAR
);

Share link

Anyone who has the link will be able to view this.