Salary Prediction
Tech job positions and salaries from glassdoor.com
@kaggle.thedevastator_jobs_dataset_from_glassdoor
Tech job positions and salaries from glassdoor.com
@kaggle.thedevastator_jobs_dataset_from_glassdoor
This dataset contains job postings from Glassdoor.com from 2017 with the following features It can be used to analyze the current trends based on job positions, company size, etc.
This dataset contains job postings from Glassdoor.com from 2017, It can be used to analyze salaries based on company size and other information.
- Identify which factors most affect data science salaries
- Determine which states and cities offer the highest paying data science jobs
- Predict what a data science job posting will pay based on the job description
This dataset was scraped from Glassdoor.com by Ramiro Gomez.
License
> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: eda_data.csv
| Column name | Description |
|---|---|
| job_id | The unique identifier for the job posting (Numeric) |
| job_state | The state where the job is located (String) |
| same_state | A binary indicator of whether the job is in the same state as the person looking at the job (String) |
| age | The age of the person looking at the job (Numeric) |
| python_yn | A binary indicator of whether the person looking at the job knows Python (String) |
| R_yn | A binary indicator of whether the person looking at the job knows R (String) |
| spark | A binary indicator of whether the person looking at the job knows Spark (String) |
| aws | A binary indicator of whether the person looking at the job knows AWS (String) |
| excel | A binary indicator of whether the person looking at the job knows Excel (String) |
| job_simp | A simplified job title (String) |
| seniority | The seniority of the job (String) |
| desc_len | The length of the job description (Numeric) |
| num_comp | The number of competitors for the job (Numeric) |
File: glassdoor_jobs.csv
| Column name | Description |
|---|---|
| job_id | The unique identifier for the job posting (Numeric) |
File: salary_data_cleaned.csv
| Column name | Description |
|---|---|
| job_state | The state where the job is located (String) |
| same_state | A binary indicator of whether the job is in the same state as the person looking at the job (String) |
| age | The age of the person looking at the job (Numeric) |
| python_yn | A binary indicator of whether the person looking at the job knows Python (String) |
| R_yn | A binary indicator of whether the person looking at the job knows R (String) |
| spark | A binary indicator of whether the person looking at the job knows Spark (String) |
| aws | A binary indicator of whether the person looking at the job knows AWS (String) |
| excel | A binary indicator of whether the person looking at the job knows Excel (String) |
CREATE TABLE eda_data (
"unnamed_0" BIGINT -- Unnamed: 0,
"job_title" VARCHAR,
"salary_estimate" VARCHAR,
"job_description" VARCHAR,
"rating" DOUBLE,
"company_name" VARCHAR,
"location" VARCHAR,
"headquarters" VARCHAR,
"size" VARCHAR,
"founded" BIGINT,
"type_of_ownership" VARCHAR,
"industry" VARCHAR,
"sector" VARCHAR,
"revenue" VARCHAR,
"competitors" VARCHAR,
"hourly" BIGINT,
"employer_provided" BIGINT,
"min_salary" BIGINT,
"max_salary" BIGINT,
"avg_salary" DOUBLE,
"company_txt" VARCHAR,
"job_state" VARCHAR,
"same_state" BIGINT,
"age" BIGINT,
"python_yn" BIGINT,
"r_yn" BIGINT,
"spark" BIGINT,
"aws" BIGINT,
"excel" BIGINT,
"job_simp" VARCHAR,
"seniority" VARCHAR,
"desc_len" BIGINT,
"num_comp" BIGINT
);CREATE TABLE glassdoor_jobs (
"unnamed_0" BIGINT -- Unnamed: 0,
"job_title" VARCHAR,
"salary_estimate" VARCHAR,
"job_description" VARCHAR,
"rating" DOUBLE,
"company_name" VARCHAR,
"location" VARCHAR,
"headquarters" VARCHAR,
"size" VARCHAR,
"founded" BIGINT,
"type_of_ownership" VARCHAR,
"industry" VARCHAR,
"sector" VARCHAR,
"revenue" VARCHAR,
"competitors" VARCHAR
);CREATE TABLE salary_data_cleaned (
"job_title" VARCHAR,
"salary_estimate" VARCHAR,
"job_description" VARCHAR,
"rating" DOUBLE,
"company_name" VARCHAR,
"location" VARCHAR,
"headquarters" VARCHAR,
"size" VARCHAR,
"founded" BIGINT,
"type_of_ownership" VARCHAR,
"industry" VARCHAR,
"sector" VARCHAR,
"revenue" VARCHAR,
"competitors" VARCHAR,
"hourly" BIGINT,
"employer_provided" BIGINT,
"min_salary" BIGINT,
"max_salary" BIGINT,
"avg_salary" DOUBLE,
"company_txt" VARCHAR,
"job_state" VARCHAR,
"same_state" BIGINT,
"age" BIGINT,
"python_yn" BIGINT,
"r_yn" BIGINT,
"spark" BIGINT,
"aws" BIGINT,
"excel" BIGINT
);Anyone who has the link will be able to view this.