2017 Kaggle Machine Learning & Data Science Survey
A big picture view of the state of data science and machine learning.
@kaggle.kaggle_kaggle_survey_2017
A big picture view of the state of data science and machine learning.
@kaggle.kaggle_kaggle_survey_2017
For the first time, Kaggle conducted an industry-wide survey to establish a comprehensive view of the state of data science and machine learning. The survey received over 16,000 responses and we learned a ton about who is working with data, what’s happening at the cutting edge of machine learning across industries, and how new data scientists can best break into the field.
To share some of the initial insights from the survey, we’ve worked with the folks from The Pudding to put together this interactive report. They’ve shared all of the kernels used in the report here.
The data includes 5 files:
schema.csv
: a CSV file with survey schema. This schema includes the questions that correspond to each column name in both the multipleChoiceResponses.csv
and freeformResponses.csv
.multipleChoiceResponses.csv
: Respondents' answers to multiple choice and ranking questions. These are non-randomized and thus a single row does correspond to all of a single user's answers.freeformResponses.csv
: Respondents' freeform answers to Kaggle's survey questions. These responses are randomized within a column, so that reading across a single row does not give a single user's answers.conversionRates.csv
: Currency conversion rates (to USD) as accessed from the R package "quantmod" on September 14, 2017RespondentTypeREADME.txt
: This is a schema for decoding the responses in the "Asked" column of the schema.csv
file.In the month of November, we’re awarding $1000 a week for code and analyses shared on this dataset via Kaggle Kernels. Read more about this month’s Kaggle Kernels Awards and help us advance the state of machine learning and data science by exploring this one of a kind dataset.
schema.csv
file called "Asked" that describes who saw each question. You can learn more about the different segments we used in the schema.csv
file and RespondentTypeREADME.txt
in the data tab.Anyone who has the link will be able to view this.