Baselight

Learning From Imbalanced Insurance Data

Cross-sell Prediction

@kaggle.arashnic_imbalanced_data_practice

About this Dataset

Learning From Imbalanced Insurance Data

Context

Insurance companies that sell life, health, and property and casualty insurance are using machine learning (ML) to drive improvements in customer service, fraud detection, and operational efficiency. The data provided by an Insurance company which is not excluded from other companies to getting advantage of ML. This company provides Health Insurance to its customers. We can build a model to predict whether the policyholders (customers) from past year will also be interested in Vehicle Insurance provided by the company.

An insurance policy is an arrangement by which a company undertakes to provide a guarantee of compensation for specified loss, damage, illness, or death in return for the payment of a specified premium. A premium is a sum of money that the customer needs to pay regularly to an insurance company for this guarantee.

For example, you may pay a premium of Rs. 5000 each year for a health insurance cover of Rs. 200,000/- so that if, God forbid, you fall ill and need to be hospitalized in that year, the insurance provider company will bear the cost of hospitalization etc. for up to Rs. 200,000. Now if you are wondering how can company bear such high hospitalization cost when it charges a premium of only Rs. 5000/-, that is where the concept of probabilities comes in picture. For example, like you, there may be 100 customers who would be paying a premium of Rs. 5000 every year, but only a few of them (say 2-3) would get hospitalized that year and not everyone. This way everyone shares the risk of everyone else.

Just like medical insurance, there is vehicle insurance where every year customer needs to pay a premium of certain amount to insurance provider company so that in case of unfortunate accident by the vehicle, the insurance provider company will provide a compensation (called ‘sum assured’) to the customer.

Content

Building a model to predict whether a customer would be interested in Vehicle Insurance is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimize its business model and revenue.

We have information about:

  • Demographics (gender, age, region code type),
  • Vehicles (Vehicle Age, Damage),
  • Policy (Premium, sourcing channel) etc.

Update: Test data target values has been added. To evaluate your models more precisely you can use:
https://www.kaggle.com/arashnic/answer

Moreover the supplemental goal is to practice learning imbalanced data and verify how the results can help in real operational process. The Response feature (target) is highly imbalanced.

0: 319594
1: 62531
Name: Response, dtype: int64

Practicing some techniques like resampling is useful to verify impacts on validation results and confusion matrix.

Starter Kernel(s)

Inspiration

Predict whether a customer would be interested in Vehicle Insurance

MORE DATASETs ...

Share link

Anyone who has the link will be able to view this.