LLM Feedback Collection by Kaggle | Technology and IT

About this Dataset

LLM Feedback Collection

Induce fine-grained evaluation capabilities into language models

By Huggingface Hub [source]

About this dataset

This dataset contains 100,000 feedback responses from GPT-4 AI models along with rubrics designed to evaluate both absolute and ranking scores. Each response is collected through a comprehensive evaluation process that takes into account the model's feedback, instruction, criteria for scoring, referenced answer and input given. This data provides researchers and developers with valuable insights into the performance of their AI models on various tasks as well as the ability to compare them against one another using precise and accurate measures. Each response is accompanied by five descriptive scores that give a detailed overview of its quality in terms of relevance to the input given, accuracy in reference to the reference answer provided, coherence between different parts of the output such as grammar and organization, fluency in expression of ideas without errors or unnecessary repetitions, and overall productivity accounting for all other factors combined. With this dataset at your disposal, you will be able to evaluate each output qualitatively without having to manually inspect every single response

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains feedback from GPT-4 models, along with associated rubrics for absolute and ranking scoring. It can be used to evaluate the performance of GPT-4 models on different challenging tasks.

In order to use this dataset effectively, it is important to understand the data provided in each column:

orig_feedback – Feedback given by the original GPT-4 model

orig_score2_description – Description of the second score given to the original GPT-4 model

orig_reference_answer – Reference answer used to evaluate the original GPT-4 model

output – Output from the fine-grained evaluation

orig_response – Response from the original GPT-4 model * orig_criteria – Criteria used to evaluate the original GPT-4 model *orig_instruction– Instruction given to the original GPT 4 model *orig_score3 _description– Description of third score given to

Research Ideas

Data-driven evaluation of GPT-4 models using the absolute and ranking scores collected from this dataset.

Training a deep learning model to automate the assessment of GPT-4 responses based on the rubrics provided in this dataset.

Building a semantic search engine using GPT-4 that is able to identify relevant responses more accurately with the help of this dataset's data collection metrics and rubrics for scoring

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv

Column name	Description
orig_feedback	Feedback from the evaluator. (Text)
orig_score2_description	Description of the second score given by the evaluator. (Text)
orig_reference_answer	Reference answer used to evaluate the model response. (Text)
output	Output from the GPT-4 model. (Text)
orig_response	Original response from the GPT-4 model. (Text)
orig_criteria	Criteria used by the evaluator to rate the response. (Text)
orig_instruction	Instructions provided by the evaluator. (Text)
orig_score3_description	Description of the third score given by the evaluator. (Text)
orig_score5_description	Description of the fifth score given by the evaluator. (Text)
orig_score1_description	Description of the first score given by the evaluator. (Text)
input	Input given to the evaluation. (Text)
orig_score4_description	Description of the fourth score given by the evaluator. (Text)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Train

@kaggle.thedevastator_fine_grained_gpt_4_evaluation.train

438.23 MB
99952 rows
14 columns


CREATE TABLE train (
  "orig_feedback" VARCHAR,
  "orig_score2_description" VARCHAR,
  "orig_reference_answer" VARCHAR,
  "output" VARCHAR,
  "orig_response" VARCHAR,
  "orig_criteria" VARCHAR,
  "orig_instruction" VARCHAR,
  "orig_score" BIGINT,
  "orig_score3_description" VARCHAR,
  "orig_score5_description" VARCHAR,
  "orig_score1_description" VARCHAR,
  "instruction" VARCHAR,
  "input" VARCHAR,
  "orig_score4_description" VARCHAR
);