Baselight

LLM Feedback Collection

Induce fine-grained evaluation capabilities into language models

@kaggle.thedevastator_fine_grained_gpt_4_evaluation

Loading...
Loading...

About this Dataset

LLM Feedback Collection


LLM Feedback Collection

Induce fine-grained evaluation capabilities into language models

By Huggingface Hub [source]


About this dataset

This dataset contains 100,000 feedback responses from GPT-4 AI models along with rubrics designed to evaluate both absolute and ranking scores. Each response is collected through a comprehensive evaluation process that takes into account the model's feedback, instruction, criteria for scoring, referenced answer and input given. This data provides researchers and developers with valuable insights into the performance of their AI models on various tasks as well as the ability to compare them against one another using precise and accurate measures. Each response is accompanied by five descriptive scores that give a detailed overview of its quality in terms of relevance to the input given, accuracy in reference to the reference answer provided, coherence between different parts of the output such as grammar and organization, fluency in expression of ideas without errors or unnecessary repetitions, and overall productivity accounting for all other factors combined. With this dataset at your disposal, you will be able to evaluate each output qualitatively without having to manually inspect every single response

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains feedback from GPT-4 models, along with associated rubrics for absolute and ranking scoring. It can be used to evaluate the performance of GPT-4 models on different challenging tasks.

In order to use this dataset effectively, it is important to understand the data provided in each column:

  • orig_feedback – Feedback given by the original GPT-4 model
  • orig_score2_description – Description of the second score given to the original GPT-4 model
  • orig_reference_answer – Reference answer used to evaluate the original GPT-4 model
  • output – Output from the fine-grained evaluation
  • orig_response – Response from the original GPT-4 model * orig_criteria – Criteria used to evaluate the original GPT-4 model *orig_instruction– Instruction given to the original GPT 4 model *orig_score3 _description– Description of third score given to

Research Ideas

  • Data-driven evaluation of GPT-4 models using the absolute and ranking scores collected from this dataset.
  • Training a deep learning model to automate the assessment of GPT-4 responses based on the rubrics provided in this dataset.
  • Building a semantic search engine using GPT-4 that is able to identify relevant responses more accurately with the help of this dataset's data collection metrics and rubrics for scoring

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv

Column name Description
orig_feedback Feedback from the evaluator. (Text)
orig_score2_description Description of the second score given by the evaluator. (Text)
orig_reference_answer Reference answer used to evaluate the model response. (Text)
output Output from the GPT-4 model. (Text)
orig_response Original response from the GPT-4 model. (Text)
orig_criteria Criteria used by the evaluator to rate the response. (Text)
orig_instruction Instructions provided by the evaluator. (Text)
orig_score3_description Description of the third score given by the evaluator. (Text)
orig_score5_description Description of the fifth score given by the evaluator. (Text)
orig_score1_description Description of the first score given by the evaluator. (Text)
input Input given to the evaluation. (Text)
orig_score4_description Description of the fourth score given by the evaluator. (Text)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Train

@kaggle.thedevastator_fine_grained_gpt_4_evaluation.train
  • 438.23 MB
  • 99952 rows
  • 14 columns
Loading...

CREATE TABLE train (
  "orig_feedback" VARCHAR,
  "orig_score2_description" VARCHAR,
  "orig_reference_answer" VARCHAR,
  "output" VARCHAR,
  "orig_response" VARCHAR,
  "orig_criteria" VARCHAR,
  "orig_instruction" VARCHAR,
  "orig_score" BIGINT,
  "orig_score3_description" VARCHAR,
  "orig_score5_description" VARCHAR,
  "orig_score1_description" VARCHAR,
  "instruction" VARCHAR,
  "input" VARCHAR,
  "orig_score4_description" VARCHAR
);

Share link

Anyone who has the link will be able to view this.