Dataset: Paraphrased Articles Using GPT-3

About this Dataset

Paraphrased Articles Using GPT-3

This dataset consists of articles title's, abstract's and introduction's that have been paraphrased using the GPT-3 language model. The original articles were selected from Assoc. Prof. Mehmet Erkut Erdem's Google Scholar page and rewritten in order to maintain their meaning while changing the wording and structure using GPT-3 language model API. The resulting dataset is useful for natural language processing tasks such as text summarization, machine translation, and data augmentation.

We manually copy the articles title's abstract's and introduction's parts, and then paste them into Google Sheets. Using OpenAI GPT-3 API's, we code a script to automatically send a API request using our inputs. After that, using pandas library, remove the new lines and save it to the .csv file.

We only use Open Access articles to create this dataset.

GPT-3 Parameters for Title Paraphrasing Process
prompt	Paraphrase the given title, using as few words from the original title as possible while keeping the key points:
model	text-davinci-003
temperature	0.85
max_tokens	dynamically calculated using input text lengths
top_p	0.7
frequency_penalty	0
presence_penalty	0.4
best_of	4

GPT-3 Parameters for Abstract and Introduction Paraphrasing Process
prompt	Paraphase the following paragraph while keeping scientific details and using as few words from original paragraph. Output must be the longer sizes as input:
model	text-davinci-003
temperature	0.8
max_tokens	dynamically calculated using input text lengths
top_p	0.75
frequency_penalty	0
presence_penalty	0.3
best_of	3

Overall, this dataset represents a valuable resource for researchers and practitioners in the field of natural language processing, as it provides a diverse and high-quality source of paraphrased articles that can be used for a range of NLP tasks.

Tables

Paraphrased Articles

@kaggle.aemreusta_paraphrased_articles_using_gpt3.paraphrased_articles

338.3 KB
69 rows
7 columns


CREATE TABLE paraphrased_articles (
  "title" VARCHAR,
  "abstract" VARCHAR,
  "introduction" VARCHAR,
  "paraphrasedtitle" VARCHAR,
  "paraphrasedabstract" VARCHAR,
  "paraphraseintroduction" VARCHAR,
  "url" VARCHAR
);