Baselight

Ultrafeedback Binarized

Predicting Binary Preferences with SFT, PPO and DPO

@kaggle.thedevastator_ultra_fine_binary_preference_learning

Loading...
Loading...

About this Dataset

Ultrafeedback Binarized


Ultrafeedback Binarized

Predicting Binary Preferences with SFT, PPO and DPO

By Huggingface Hub [source]


About this dataset

This dataset contains data for ultra-fine-grained binary preference learning tasks. It features three distinct datasets - SFT, PPO, and DPO. These datasets provide rich insights into the user preferences via prompts, chosen and rejected messages, as well as scores assigned to each option. This is a great dataset to perform analysis on regarding user sentiment towards different input prompts and which responses they find more desirable or satisfying. Analyzing this data can offer deeper understanding of how people think in order to improve many applications that rely on artificial intelligence such as recommendation systems or automated customer service programs. By delving into this data we are able to gain a better understanding of the human mind with respect to decision making processes thus allowing us to develop more interpretable models in machine learning that operate closer from our own logic

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can be used to train and evaluate models for ultra-fine-grained binary preference learning tasks. The data is organized into three files: SFT, PPO, and DPO. Each file contains a series of prompts, chosen and rejected messages, and scores for each option. With this data, you can train a model that can predict user preferences consistently and accurately across multiple settings.

Here are the steps to work with this dataset:

  • Read through the prompts in each file and understand what the task is asking of the user.
  • Review both the chosen and rejected messages based on their accompanying scores to understand how they are influencing or being influenced by other factors such as emotion or sentiment.
  • Using your understanding of the task at hand from 1 & 2), create a model that accurately predicts user preference for any pair of options given in an ultra-fine grained binary preference learning task (SFT, PPO or DPO).
  • Validate your model against other predictions using unseen data sets from all three files (SFT, PPO & DPO). This will help you determine whether your model accurately predicts user preferences within different contexts.

With these steps you should have an understanding of how best to use this dataset in order to build models which reliably predict how users will respond when presented with a choice between two options in an ultra-fine grained binary preference learning scenario!

Research Ideas

  • Training a model or algorithm based on machine learning and natural language processing methods to determine user preferences between ultra-fine-grained options.
  • Developing a supervised learning algorithm that uses the information from the prompt, chosen option, rejected option, message and score variables to identify factors that influence user preference selection for ultra-fine-grained tasks.
  • Utilizing reinforcement learning agents such as PPO (Proximal Policy Optimization) or DPO (Deep Deterministic Policy Gradients) to create policies for effectively selecting between ultra-fine-grained options in different domains, via interactive experiments with real user data collected from this dataset

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: test_sft.csv

Column name Description
prompt The prompt that was given to the user. (String)
chosen The message that the user chose. (String)
rejected The message that the user rejected. (String)
messages The messages that were presented to the user. (List)
score_chosen The score assigned to the chosen message. (Integer)
score_rejected The score assigned to the rejected message. (Integer)

File: train_sft.csv

Column name Description
prompt The prompt that was given to the user. (String)
chosen The message that the user chose. (String)
rejected The message that the user rejected. (String)
messages The messages that were presented to the user. (List)
score_chosen The score assigned to the chosen message. (Integer)
score_rejected The score assigned to the rejected message. (Integer)

File: train_gen.csv

Column name Description
prompt The prompt that was given to the user. (String)
chosen The message that the user chose. (String)
rejected The message that the user rejected. (String)
messages The messages that were presented to the user. (List)
score_chosen The score assigned to the chosen message. (Integer)
score_rejected The score assigned to the rejected message. (Integer)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Test Gen

@kaggle.thedevastator_ultra_fine_binary_preference_learning.test_gen
  • 2.77 MB
  • 1000 rows
  • 7 columns
Loading...

CREATE TABLE test_gen (
  "prompt" VARCHAR,
  "prompt_id" VARCHAR,
  "chosen" VARCHAR,
  "rejected" VARCHAR,
  "messages" VARCHAR,
  "score_chosen" DOUBLE,
  "score_rejected" DOUBLE
);

Test Prefs

@kaggle.thedevastator_ultra_fine_binary_preference_learning.test_prefs
  • 6.91 MB
  • 2000 rows
  • 7 columns
Loading...

CREATE TABLE test_prefs (
  "prompt" VARCHAR,
  "prompt_id" VARCHAR,
  "chosen" VARCHAR,
  "rejected" VARCHAR,
  "messages" VARCHAR,
  "score_chosen" DOUBLE,
  "score_rejected" DOUBLE
);

Test Sft

@kaggle.thedevastator_ultra_fine_binary_preference_learning.test_sft
  • 3.41 MB
  • 1000 rows
  • 7 columns
Loading...

CREATE TABLE test_sft (
  "prompt" VARCHAR,
  "prompt_id" VARCHAR,
  "chosen" VARCHAR,
  "rejected" VARCHAR,
  "messages" VARCHAR,
  "score_chosen" DOUBLE,
  "score_rejected" DOUBLE
);

Train Gen

@kaggle.thedevastator_ultra_fine_binary_preference_learning.train_gen
  • 173.27 MB
  • 61966 rows
  • 7 columns
Loading...

CREATE TABLE train_gen (
  "prompt" VARCHAR,
  "prompt_id" VARCHAR,
  "chosen" VARCHAR,
  "rejected" VARCHAR,
  "messages" VARCHAR,
  "score_chosen" DOUBLE,
  "score_rejected" DOUBLE
);

Train Prefs

@kaggle.thedevastator_ultra_fine_binary_preference_learning.train_prefs
  • 213.96 MB
  • 61966 rows
  • 7 columns
Loading...

CREATE TABLE train_prefs (
  "prompt" VARCHAR,
  "prompt_id" VARCHAR,
  "chosen" VARCHAR,
  "rejected" VARCHAR,
  "messages" VARCHAR,
  "score_chosen" DOUBLE,
  "score_rejected" DOUBLE
);

Train Sft

@kaggle.thedevastator_ultra_fine_binary_preference_learning.train_sft
  • 213.96 MB
  • 61966 rows
  • 7 columns
Loading...

CREATE TABLE train_sft (
  "prompt" VARCHAR,
  "prompt_id" VARCHAR,
  "chosen" VARCHAR,
  "rejected" VARCHAR,
  "messages" VARCHAR,
  "score_chosen" DOUBLE,
  "score_rejected" DOUBLE
);

Share link

Anyone who has the link will be able to view this.