Dataset: Insurance Claim Dataset

About this Dataset

Insurance Claim Dataset

Description:

A simple yet challenging project, to anticipate whether the insurance will be claimed or not.
The complexity arises due to the fact that the dataset has fewer samples, & is slightly imbalanced.
Can you overcome these obstacles & build a good predictive model to classify them?

This data frame contains the following columns:

age : age of policyholder
sex: gender of policy holder (female=0, male=1)
bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 25
steps: average walking steps per day of policyholder
children: number of children / dependents of policyholder
smoker: smoking state of policyholder (non-smoke=0;smoker=1)
region: the residential area of policyholder in the US (northeast=0, northwest=1, southeast=2, southwest=3)
charges: individual medical costs billed by health insurance
insuranceclaim: yes=1, no=0

This is "Sample Insurance Claim Prediction Dataset" which based on "[Medical Cost Personal Datasets][1]" to update sample value on top.

Acknowledgements:

This dataset has been referred from Kaggle.

Objective:

Understand the Dataset & cleanup (if required).
Build classification model to predict weather the insurance will be claimed or not.
Also fine-tune the hyperparameters & compare the evaluation metrics of vaious classification algorithms.

Tables

Insurance

@kaggle.yasserh_insurance_claim_dataset.insurance

24.89 KB
1338 rows
8 columns


CREATE TABLE insurance (
  "age" BIGINT,
  "sex" BIGINT,
  "bmi" DOUBLE,
  "children" BIGINT,
  "smoker" BIGINT,
  "region" BIGINT,
  "charges" DOUBLE,
  "insuranceclaim" BIGINT
);