Baselight

Embeddings Actuarial Loss Competition

First 20 components of PCA on Sentence Embeddings

@kaggle.louise2001_embeddings_actuarial_loss_competition

Loading...
Loading...

About this Dataset

Embeddings Actuarial Loss Competition

Context

In the actuarial loss competition, we are provided with text data describing accident context and injury type.

Content

This is the first 20 components of a PCA performed on sentence embeddings of the claim descriptions. It is the concatenation of train and test data.

Acknowledgements

The embeddings were obtained with paraphrase distil roberta from sentence-transformers : https://github.com/UKPLab/sentence-transformers.

Inspiration

How can these embeddings explain the claim cost, and help predict it ?

Tables

Embeddings Test 20

@kaggle.louise2001_embeddings_actuarial_loss_competition.embeddings_test_20
  • 4.6 MB
  • 36000 rows
  • 20 columns
Loading...

CREATE TABLE embeddings_test_20 (
  "x_0" DOUBLE,
  "x_1" DOUBLE,
  "x_2" DOUBLE,
  "x_3" DOUBLE,
  "x_4" DOUBLE,
  "x_5" DOUBLE,
  "x_6" DOUBLE,
  "x_7" DOUBLE,
  "x_8" DOUBLE,
  "x_9" DOUBLE,
  "x_10" DOUBLE,
  "x_11" DOUBLE,
  "x_12" DOUBLE,
  "x_13" DOUBLE,
  "x_14" DOUBLE,
  "x_15" DOUBLE,
  "x_16" DOUBLE,
  "x_17" DOUBLE,
  "x_18" DOUBLE,
  "x_19" DOUBLE
);

Embeddings Train 20

@kaggle.louise2001_embeddings_actuarial_loss_competition.embeddings_train_20
  • 6.53 MB
  • 54000 rows
  • 20 columns
Loading...

CREATE TABLE embeddings_train_20 (
  "x_0" DOUBLE,
  "x_1" DOUBLE,
  "x_2" DOUBLE,
  "x_3" DOUBLE,
  "x_4" DOUBLE,
  "x_5" DOUBLE,
  "x_6" DOUBLE,
  "x_7" DOUBLE,
  "x_8" DOUBLE,
  "x_9" DOUBLE,
  "x_10" DOUBLE,
  "x_11" DOUBLE,
  "x_12" DOUBLE,
  "x_13" DOUBLE,
  "x_14" DOUBLE,
  "x_15" DOUBLE,
  "x_16" DOUBLE,
  "x_17" DOUBLE,
  "x_18" DOUBLE,
  "x_19" DOUBLE
);

Share link

Anyone who has the link will be able to view this.