First 20 components of PCA on Sentence Embeddings
Dataset Description
Context
In the actuarial loss competition, we are provided with text data describing accident context and injury type.
Content
This is the first 20 components of a PCA performed on sentence embeddings of the claim descriptions. It is the concatenation of train and test data.
Acknowledgements
The embeddings were obtained with paraphrase distil roberta from sentence-transformers : https://github.com/UKPLab/sentence-transformers.
Inspiration
How can these embeddings explain the claim cost, and help predict it ?
Related Datasets
-
Economic Lexicon
@ecjrc