Baselight

Real / Fake Job Posting Prediction

Dataset of real and fake job postings

@kaggle.shivamb_real_or_fake_fake_jobposting_prediction

Loading...
Loading...

About this Dataset

Real / Fake Job Posting Prediction

[Real or Fake] : Fake Job Description Prediction

This dataset contains 18K job descriptions out of which about 800 are fake. The data consists of both textual information and meta-information about the jobs. The dataset can be used to create classification models which can learn the job descriptions which are fraudulent.

Acknowledgements

The University of the Aegean | Laboratory of Information & Communication Systems Security
http://emscad.samos.aegean.gr/

Inspiration

The dataset is very valuable as it can be used to answer the following questions:

  1. Create a classification model that uses text data features and meta-features and predict which job description are fraudulent or real.
  2. Identify key traits/features (words, entities, phrases) of job descriptions which are fraudulent in nature.
  3. Run a contextual embedding model to identify the most similar job descriptions.
  4. Perform Exploratory Data Analysis on the dataset to identify interesting insights from this dataset.

Tables

Fake Job Postings

@kaggle.shivamb_real_or_fake_fake_jobposting_prediction.fake_job_postings
  • 21.96 MB
  • 17880 rows
  • 18 columns
Loading...

CREATE TABLE fake_job_postings (
  "job_id" BIGINT,
  "title" VARCHAR,
  "location" VARCHAR,
  "department" VARCHAR,
  "salary_range" VARCHAR,
  "company_profile" VARCHAR,
  "description" VARCHAR,
  "requirements" VARCHAR,
  "benefits" VARCHAR,
  "telecommuting" BIGINT,
  "has_company_logo" BIGINT,
  "has_questions" BIGINT,
  "employment_type" VARCHAR,
  "required_experience" VARCHAR,
  "required_education" VARCHAR,
  "industry" VARCHAR,
  "function" VARCHAR,
  "fraudulent" BIGINT
);

Share link

Anyone who has the link will be able to view this.