BSARD: French Belgian Law Dataset for IR
Retrieving Relevant Statutes for Legal Questions
By Huggingface Hub [source]
About this dataset
The Belgian Statutory Article Retrieval Dataset (BSARD) is an invaluable resource for legal research, providing an expansive French native corpus of more than 22,600 Belgian law articles and over 1,100 legal questions posed by citizens. The experienced jurists behind this dataset have accurately labeled the pertinent articles that correspond to each question in order to aid in quick and efficient research. This comprehensive collection serves as an invaluable tool for those seeking accurate information concerning Belgian law without the hassle associated with searching through endless webpages for the right article. Boasting a wide array of categories such as government regulations, civil code, criminal code and more, BSARD is certain to provide users with all they need to stay informed about their rights while also keeping them up-to-date on applicable laws within Belgium
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
This guide will provide an overview of how to use the BSARD Dataset. The dataset is comprised of Belgian law articles, labeled legal questions, and additional information pertaining to each question.
To begin exploring this dataset, you will first need to download the files that make up the dataset. These include a train.csv file, which contains labeled questions and their corresponding relevant articles; a test.csv file which contains the list of legal questions posed by citizens without corresponding article ids; and a synthetic.csv containing synthetic legal questions along with their corresponding article ids, categories, subcategories and extra descriptions provided to help further clarify each question. Once downloaded these files can be imported into your own notebooks for further analysis or processing in an IDE or other suitable spreadsheet software for working with .csv files such as Microsoft Excel or Google Sheets..
Once you have imported your data into compatible software you can begin dive deeper into what makes up each field contained in these datasets. Starting off with Train and Test datasets- we have columns containing questions posed by citizens such as ‘Question’ followed by its Category & Subcategory fields where appropriate (which are categorizations assigned by experienced jurists). We additionally see two other important fields; Extra Description which gives context accompanying citizen's question if applicable ,& Article Ids - giving reference point for articles associated with given query from corpus set that we've created) . Synthetic Dataset also has same columns but differs in that it focus more on smaller subset of our assets alongside refined collection automated generated queries & its assistant meta descriptors representing extra contexts given per query like ‘Question’ followed by its Category & Subcategory fields , plus additional field described earlier ,Where all values coming from existing entries found in our base set but slightly modified since automation encompasses certain level randomization to prevent overfitting/biasing class labeling systems entry algorithm..
The last step is using data gathered through exploration of our dataset& coupled insights pulled from human expert team responsible for label assignments use it all focused end goal: tackling multitude maintainable legibly easily process able application formats pertinent law retrieval project ourselves familiarize scheme techniques models help create improve existing already successful knowledgebase products evolve reach new heights accuracy precision end user experience closely tracked monitored evaluated over time be sure best results delivered environment conducive learning useful whatever might come next journey together lay foundation tech today benefit community tomorrow!
Research Ideas
- Creating a Machine Learning algorithm that can identify the relevant articles for a given legal question.
- Using Natural Language Processing to analyze the categorical and sub-categorical data associated with each question and article, helping to better understand conflicts between law statutes or within different categories of law.
- Leveraging the dataset to build an AI chatbot which provides instant clarification from built-in legal advice databases on Belgian laws
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: train.csv
Column name |
Description |
question |
The legal question posed by a Belgian citizen. (String) |
category |
The broad category of the legal question. (String) |
subcategory |
The more specific subcategory of the legal question. (String) |
extra_description |
Additional information about the legal question. (String) |
File: test.csv
Column name |
Description |
question |
The legal question posed by a Belgian citizen. (String) |
category |
The broad category of the legal question. (String) |
subcategory |
The more specific subcategory of the legal question. (String) |
extra_description |
Additional information about the legal question. (String) |
File: synthetic.csv
Column name |
Description |
question |
The legal question posed by a Belgian citizen. (String) |
category |
The broad category of the legal question. (String) |
subcategory |
The more specific subcategory of the legal question. (String) |
extra_description |
Additional information about the legal question. (String) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.