LexGLUE: Legal NLP Benchmark by Kaggle | Other

About this Dataset

LexGLUE: Legal NLP Benchmark

Legal NLP Benchmark Dataset: LexGLUE

By lex_glue (From Huggingface) [source]

About this dataset

The LexGLUE dataset is a comprehensive benchmark dataset specially created to evaluate the performance of natural language processing (NLP) models in various legal tasks. This dataset draws inspiration from the success of other multi-task NLP benchmarks like GLUE and SuperGLUE, as well as similar initiatives in different domains.

The primary objective of LexGLUE is to advance the development of versatile models that can effectively handle multiple legal NLP tasks without requiring extensive task-specific fine-tuning. By providing a standardized evaluation platform, this dataset aims to foster innovation and advancements in the field of legal language understanding.

The dataset consists of several columns that provide crucial information for each entry. The context column contains the specific text or document from which each legal language understanding task is derived, offering essential background information for proper comprehension. The endings column presents multiple potential options or choices that could complete the legal task at hand, enabling comprehensive evaluation.

Furthermore, there are various columns related to labels and target categories associated with each entry. The label column represents the correct or expected answer for a given task, ensuring accuracy in model predictions during evaluation. The labels column provides categorical information regarding target labels or categories relevant to the respective legal NLP task.

Another important element within this dataset is the text column, which contains the actual input text representing a particular legal scenario or context for analysis. Analyzing this text forms an integral part of conducting accurate and effective NLP tasks within a legal context.

To facilitate efficient model performance assessment on diverse aspects of legal language understanding, additional files are included in this benchmark dataset: case_hold_test.csv comprises case contexts with multiple potential endings labeled as valid holdings or not; ledgar_validation.csv serves as a validation set specifically designed for evaluating NLP models' performance on legal tasks; ecthr_b_test.csv contains samples related to European Court of Human Rights (ECtHR) along with their corresponding labels for testing the capabilities of legal language understanding models in this domain.

By providing a longer, accurate, informative, and descriptive description of the LexGLUE dataset, it becomes evident that it serves as a crucial resource for researchers and practitioners to benchmark and advance the state-of-the-art in legal NLP tasks

Research Ideas

Training and evaluating NLP models: The LexGLUE dataset can be used to train and evaluate natural language processing models specifically designed for legal language understanding tasks. By using this dataset, researchers and developers can test the performance of their models on various legal NLP tasks, such as legal case analysis or European Court of Human Rights (ECtHR) related tasks.

Developing generic NLP models: The benchmark dataset is designed to push towards the development of generic models that can handle multiple legal NLP tasks with limited task-specific fine-tuning. Researchers can use this dataset to develop robust and versatile NLP models that can effectively understand and analyze legal texts.

Comparing different algorithms and approaches: LexGLUE provides a standardized benchmark for comparing different algorithms and approaches in the field of legal language understanding. Researchers can use this dataset to compare the performance of different techniques, such as rule-based methods, deep learning models, or transformer architectures, on various legal NLP tasks. This allows for a fair comparison between different approaches and facilitates progress in the field by identifying effective methods for solving specific legal language understanding challenges

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: case_hold_test.csv

Column name	Description
context	Text or document from which the legal language understanding task is derived. (Text)
endings	Possible options or choices for completing the legal language understanding task. (Text)
label	Correct or expected answer for the legal language understanding task. (Text)

File: ledgar_validation.csv

Column name	Description
label	Correct or expected answer for the legal language understanding task. (Text)
context	Text or document from which the legal language understanding task is derived. (Text)

File: ecthr_b_test.csv

Column name	Description
context	Text or document from which the legal language understanding task is derived. (Text)
labels	Target labels or categories for each specific legal NLP task. (Text)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit lex_glue (From Huggingface).

Tables

Case Hold Test

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.case_hold_test

3.15 MB
3600 rows
3 columns


CREATE TABLE case_hold_test (
  "context" VARCHAR,
  "endings" VARCHAR,
  "label" BIGINT
);

Case Hold Train

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.case_hold_train

39.14 MB
45000 rows
3 columns


CREATE TABLE case_hold_train (
  "context" VARCHAR,
  "endings" VARCHAR,
  "label" BIGINT
);

Case Hold Validation

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.case_hold_validation

3.4 MB
3900 rows
3 columns


CREATE TABLE case_hold_validation (
  "context" VARCHAR,
  "endings" VARCHAR,
  "label" BIGINT
);

Ecthr A Test

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.ecthr_a_test

5.34 MB
1000 rows
2 columns


CREATE TABLE ecthr_a_test (
  "text" VARCHAR,
  "labels" VARCHAR
);

Ecthr A Train

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.ecthr_a_train

39.84 MB
9000 rows
2 columns


CREATE TABLE ecthr_a_train (
  "text" VARCHAR,
  "labels" VARCHAR
);

Ecthr A Validation

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.ecthr_a_validation

4.94 MB
1000 rows
2 columns


CREATE TABLE ecthr_a_validation (
  "text" VARCHAR,
  "labels" VARCHAR
);

Ecthr B Test

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.ecthr_b_test

5.34 MB
1000 rows
2 columns


CREATE TABLE ecthr_b_test (
  "text" VARCHAR,
  "labels" VARCHAR
);

Ecthr B Train

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.ecthr_b_train

39.85 MB
9000 rows
2 columns


CREATE TABLE ecthr_b_train (
  "text" VARCHAR,
  "labels" VARCHAR
);

Ecthr B Validation

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.ecthr_b_validation

4.94 MB
1000 rows
2 columns


CREATE TABLE ecthr_b_validation (
  "text" VARCHAR,
  "labels" VARCHAR
);

Eurlex Test

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.eurlex_test

23.15 MB
5000 rows
2 columns


CREATE TABLE eurlex_test (
  "text" VARCHAR,
  "labels" VARCHAR
);

Eurlex Train

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.eurlex_train

158.79 MB
55000 rows
2 columns


CREATE TABLE eurlex_train (
  "text" VARCHAR,
  "labels" VARCHAR
);

Eurlex Validation

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.eurlex_validation

16.25 MB
5000 rows
2 columns


CREATE TABLE eurlex_validation (
  "text" VARCHAR,
  "labels" VARCHAR
);

Ledgar Test

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.ledgar_test

3.09 MB
10000 rows
2 columns


CREATE TABLE ledgar_test (
  "text" VARCHAR,
  "label" BIGINT
);

Ledgar Train

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.ledgar_train

19.59 MB
60000 rows
2 columns


CREATE TABLE ledgar_train (
  "text" VARCHAR,
  "label" BIGINT
);

Ledgar Validation

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.ledgar_validation

3.23 MB
10000 rows
2 columns


CREATE TABLE ledgar_validation (
  "text" VARCHAR,
  "label" BIGINT
);

Scotus Test

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.scotus_test

38.11 MB
1400 rows
2 columns


CREATE TABLE scotus_test (
  "text" VARCHAR,
  "label" BIGINT
);

Scotus Train

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.scotus_train

89.98 MB
5000 rows
2 columns


CREATE TABLE scotus_train (
  "text" VARCHAR,
  "label" BIGINT
);

Scotus Validation

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.scotus_validation

37.24 MB
1400 rows
2 columns


CREATE TABLE scotus_validation (
  "text" VARCHAR,
  "label" BIGINT
);

Unfair Tos Test

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.unfair_tos_test

143.19 KB
1607 rows
2 columns


CREATE TABLE unfair_tos_test (
  "text" VARCHAR,
  "labels" VARCHAR
);

Unfair Tos Train

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.unfair_tos_train

479.08 KB
5532 rows
2 columns


CREATE TABLE unfair_tos_train (
  "text" VARCHAR,
  "labels" VARCHAR
);

Unfair Tos Validation

@kaggle.thedevastator_lexglue_legal_nlp_benchmark_dataset.unfair_tos_validation

210.48 KB
2275 rows
2 columns


CREATE TABLE unfair_tos_validation (
  "text" VARCHAR,
  "labels" VARCHAR
);