Ultrafeedback Binarized
Predicting Binary Preferences with SFT, PPO and DPO
By Huggingface Hub [source]
About this dataset
This dataset contains data for ultra-fine-grained binary preference learning tasks. It features three distinct datasets - SFT, PPO, and DPO. These datasets provide rich insights into the user preferences via prompts, chosen and rejected messages, as well as scores assigned to each option. This is a great dataset to perform analysis on regarding user sentiment towards different input prompts and which responses they find more desirable or satisfying. Analyzing this data can offer deeper understanding of how people think in order to improve many applications that rely on artificial intelligence such as recommendation systems or automated customer service programs. By delving into this data we are able to gain a better understanding of the human mind with respect to decision making processes thus allowing us to develop more interpretable models in machine learning that operate closer from our own logic
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
This dataset can be used to train and evaluate models for ultra-fine-grained binary preference learning tasks. The data is organized into three files: SFT, PPO, and DPO. Each file contains a series of prompts, chosen and rejected messages, and scores for each option. With this data, you can train a model that can predict user preferences consistently and accurately across multiple settings.
Here are the steps to work with this dataset:
- Read through the prompts in each file and understand what the task is asking of the user.
- Review both the chosen and rejected messages based on their accompanying scores to understand how they are influencing or being influenced by other factors such as emotion or sentiment.
- Using your understanding of the task at hand from 1 & 2), create a model that accurately predicts user preference for any pair of options given in an ultra-fine grained binary preference learning task (SFT, PPO or DPO).
- Validate your model against other predictions using unseen data sets from all three files (SFT, PPO & DPO). This will help you determine whether your model accurately predicts user preferences within different contexts.
With these steps you should have an understanding of how best to use this dataset in order to build models which reliably predict how users will respond when presented with a choice between two options in an ultra-fine grained binary preference learning scenario!
Research Ideas
- Training a model or algorithm based on machine learning and natural language processing methods to determine user preferences between ultra-fine-grained options.
- Developing a supervised learning algorithm that uses the information from the prompt, chosen option, rejected option, message and score variables to identify factors that influence user preference selection for ultra-fine-grained tasks.
- Utilizing reinforcement learning agents such as PPO (Proximal Policy Optimization) or DPO (Deep Deterministic Policy Gradients) to create policies for effectively selecting between ultra-fine-grained options in different domains, via interactive experiments with real user data collected from this dataset
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: test_sft.csv
Column name |
Description |
prompt |
The prompt that was given to the user. (String) |
chosen |
The message that the user chose. (String) |
rejected |
The message that the user rejected. (String) |
messages |
The messages that were presented to the user. (List) |
score_chosen |
The score assigned to the chosen message. (Integer) |
score_rejected |
The score assigned to the rejected message. (Integer) |
File: train_sft.csv
Column name |
Description |
prompt |
The prompt that was given to the user. (String) |
chosen |
The message that the user chose. (String) |
rejected |
The message that the user rejected. (String) |
messages |
The messages that were presented to the user. (List) |
score_chosen |
The score assigned to the chosen message. (Integer) |
score_rejected |
The score assigned to the rejected message. (Integer) |
File: train_gen.csv
Column name |
Description |
prompt |
The prompt that was given to the user. (String) |
chosen |
The message that the user chose. (String) |
rejected |
The message that the user rejected. (String) |
messages |
The messages that were presented to the user. (List) |
score_chosen |
The score assigned to the chosen message. (Integer) |
score_rejected |
The score assigned to the rejected message. (Integer) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.