LLM Feedback Collection
Induce fine-grained evaluation capabilities into language models
By Huggingface Hub [source]
About this dataset
This dataset contains 100,000 feedback responses from GPT-4 AI models along with rubrics designed to evaluate both absolute and ranking scores. Each response is collected through a comprehensive evaluation process that takes into account the model's feedback, instruction, criteria for scoring, referenced answer and input given. This data provides researchers and developers with valuable insights into the performance of their AI models on various tasks as well as the ability to compare them against one another using precise and accurate measures. Each response is accompanied by five descriptive scores that give a detailed overview of its quality in terms of relevance to the input given, accuracy in reference to the reference answer provided, coherence between different parts of the output such as grammar and organization, fluency in expression of ideas without errors or unnecessary repetitions, and overall productivity accounting for all other factors combined. With this dataset at your disposal, you will be able to evaluate each output qualitatively without having to manually inspect every single response
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
This dataset contains feedback from GPT-4 models, along with associated rubrics for absolute and ranking scoring. It can be used to evaluate the performance of GPT-4 models on different challenging tasks.
In order to use this dataset effectively, it is important to understand the data provided in each column:
- orig_feedback – Feedback given by the original GPT-4 model
- orig_score2_description – Description of the second score given to the original GPT-4 model
- orig_reference_answer – Reference answer used to evaluate the original GPT-4 model
- output – Output from the fine-grained evaluation
- orig_response – Response from the original GPT-4 model * orig_criteria – Criteria used to evaluate the original GPT-4 model *orig_instruction– Instruction given to the original GPT 4 model *orig_score3 _description– Description of third score given to
Research Ideas
- Data-driven evaluation of GPT-4 models using the absolute and ranking scores collected from this dataset.
- Training a deep learning model to automate the assessment of GPT-4 responses based on the rubrics provided in this dataset.
- Building a semantic search engine using GPT-4 that is able to identify relevant responses more accurately with the help of this dataset's data collection metrics and rubrics for scoring
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: train.csv
Column name |
Description |
orig_feedback |
Feedback from the evaluator. (Text) |
orig_score2_description |
Description of the second score given by the evaluator. (Text) |
orig_reference_answer |
Reference answer used to evaluate the model response. (Text) |
output |
Output from the GPT-4 model. (Text) |
orig_response |
Original response from the GPT-4 model. (Text) |
orig_criteria |
Criteria used by the evaluator to rate the response. (Text) |
orig_instruction |
Instructions provided by the evaluator. (Text) |
orig_score3_description |
Description of the third score given by the evaluator. (Text) |
orig_score5_description |
Description of the fifth score given by the evaluator. (Text) |
orig_score1_description |
Description of the first score given by the evaluator. (Text) |
input |
Input given to the evaluation. (Text) |
orig_score4_description |
Description of the fourth score given by the evaluator. (Text) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.