LLM Human Preference Data - Ultrafeedback
UltraFeedback - Multi-Binarized using the Average of Preference Ratings (Cleaned
@kaggle.thedrcat_llm_human_preference_data_ultrafeedback
UltraFeedback - Multi-Binarized using the Average of Preference Ratings (Cleaned
@kaggle.thedrcat_llm_human_preference_data_ultrafeedback
External data for LMSYS - Chatbot Arena Human Preference Predictions competition.
Downloaded from HuggingFace dataset: argilla/ultrafeedback-multi-binarized-preferences-cleaned
Additionally, I converted the data into LMSYS train data format (you may still need to shuffle the responses).
Version 2 contains additional examples with ties between model responses that were previously filtered out.
NOTE: This dataset is based on GPT4 as a judge as a proxy for human preference rating.
UltraFeedback - Multi-Binarized using the Average of Preference Ratings (Cleaned) dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences-cleaned, and has been created to explore whether DPO fine-tuning with more than one rejection per chosen response helps the model perform better in the AlpacaEval, MT-Bench, and LM Eval Harness benchmarks.
CREATE TABLE ultrafeedback (
"source" VARCHAR,
"prompt" VARCHAR,
"chosen" VARCHAR,
"chosen_rating" DOUBLE,
"chosen_model" VARCHAR,
"rejected" VARCHAR,
"rejected_rating" DOUBLE,
"rejected_model" VARCHAR
);CREATE TABLE ultrafeedback_ties (
"source" VARCHAR,
"prompt" VARCHAR,
"chosen" VARCHAR,
"chosen_rating" DOUBLE,
"chosen_model" VARCHAR,
"rejected" VARCHAR,
"rejected_rating" DOUBLE,
"rejected_model" VARCHAR
);Anyone who has the link will be able to view this.