Baselight

LLM Human Preference Data - Ultrafeedback

UltraFeedback - Multi-Binarized using the Average of Preference Ratings (Cleaned

@kaggle.thedrcat_llm_human_preference_data_ultrafeedback

Loading...
Loading...

About this Dataset

LLM Human Preference Data - Ultrafeedback

External data for LMSYS - Chatbot Arena Human Preference Predictions competition.

Downloaded from HuggingFace dataset: argilla/ultrafeedback-multi-binarized-preferences-cleaned

Additionally, I converted the data into LMSYS train data format (you may still need to shuffle the responses).

Version 2 contains additional examples with ties between model responses that were previously filtered out.

NOTE: This dataset is based on GPT4 as a judge as a proxy for human preference rating.

UltraFeedback - Multi-Binarized using the Average of Preference Ratings (Cleaned) dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences-cleaned, and has been created to explore whether DPO fine-tuning with more than one rejection per chosen response helps the model perform better in the AlpacaEval, MT-Bench, and LM Eval Harness benchmarks.

Paper: https://arxiv.org/pdf/2310.01377

Tables

Ultrafeedback

@kaggle.thedrcat_llm_human_preference_data_ultrafeedback.ultrafeedback
  • 213.65 MB
  • 157675 rows
  • 8 columns
Loading...

CREATE TABLE ultrafeedback (
  "source" VARCHAR,
  "prompt" VARCHAR,
  "chosen" VARCHAR,
  "chosen_rating" DOUBLE,
  "chosen_model" VARCHAR,
  "rejected" VARCHAR,
  "rejected_rating" DOUBLE,
  "rejected_model" VARCHAR
);

Ultrafeedback Ties

@kaggle.thedrcat_llm_human_preference_data_ultrafeedback.ultrafeedback_ties
  • 40.23 MB
  • 16142 rows
  • 8 columns
Loading...

CREATE TABLE ultrafeedback_ties (
  "source" VARCHAR,
  "prompt" VARCHAR,
  "chosen" VARCHAR,
  "chosen_rating" DOUBLE,
  "chosen_model" VARCHAR,
  "rejected" VARCHAR,
  "rejected_rating" DOUBLE,
  "rejected_model" VARCHAR
);

Share link

Anyone who has the link will be able to view this.