This dataset consists of articles title's, abstract's and introduction's that have been paraphrased using the GPT-3 language model. The original articles were selected from Assoc. Prof. Mehmet Erkut Erdem's Google Scholar page and rewritten in order to maintain their meaning while changing the wording and structure using GPT-3 language model API. The resulting dataset is useful for natural language processing tasks such as text summarization, machine translation, and data augmentation.
We manually copy the articles title's abstract's and introduction's parts, and then paste them into Google Sheets. Using OpenAI GPT-3 API's, we code a script to automatically send a API request using our inputs. After that, using pandas library, remove the new lines and save it to the .csv file.
We only use Open Access articles to create this dataset.
GPT-3 Parameters for Title Paraphrasing Process |
|
prompt |
Paraphrase the given title, using as few words from the original title as possible while keeping the key points: |
model |
text-davinci-003 |
temperature |
0.85 |
max_tokens |
dynamically calculated using input text lengths |
top_p |
0.7 |
frequency_penalty |
0 |
presence_penalty |
0.4 |
best_of |
4 |
GPT-3 Parameters for Abstract and Introduction Paraphrasing Process |
|
prompt |
Paraphase the following paragraph while keeping scientific details and using as few words from original paragraph. Output must be the longer sizes as input: |
model |
text-davinci-003 |
temperature |
0.8 |
max_tokens |
dynamically calculated using input text lengths |
top_p |
0.75 |
frequency_penalty |
0 |
presence_penalty |
0.3 |
best_of |
3 |
Overall, this dataset represents a valuable resource for researchers and practitioners in the field of natural language processing, as it provides a diverse and high-quality source of paraphrased articles that can be used for a range of NLP tasks.