Baselight

Paraphrased Articles Using GPT-3

Paraphrased Academic Article Dataset Generated using GPT-3

@kaggle.aemreusta_paraphrased_articles_using_gpt3

Loading...
Loading...

About this Dataset

Paraphrased Articles Using GPT-3

This dataset consists of articles title's, abstract's and introduction's that have been paraphrased using the GPT-3 language model. The original articles were selected from Assoc. Prof. Mehmet Erkut Erdem's Google Scholar page and rewritten in order to maintain their meaning while changing the wording and structure using GPT-3 language model API. The resulting dataset is useful for natural language processing tasks such as text summarization, machine translation, and data augmentation.

We manually copy the articles title's abstract's and introduction's parts, and then paste them into Google Sheets. Using OpenAI GPT-3 API's, we code a script to automatically send a API request using our inputs. After that, using pandas library, remove the new lines and save it to the .csv file.

We only use Open Access articles to create this dataset.

GPT-3 Parameters for Title Paraphrasing Process
prompt Paraphrase the given title, using as few words from the original title as possible while keeping the key points:
model text-davinci-003
temperature 0.85
max_tokens dynamically calculated using input text lengths
top_p 0.7
frequency_penalty 0
presence_penalty 0.4
best_of 4
GPT-3 Parameters for Abstract and Introduction Paraphrasing Process
prompt Paraphase the following paragraph while keeping scientific details and using as few words from original paragraph. Output must be the longer sizes as input:
model text-davinci-003
temperature 0.8
max_tokens dynamically calculated using input text lengths
top_p 0.75
frequency_penalty 0
presence_penalty 0.3
best_of 3

Overall, this dataset represents a valuable resource for researchers and practitioners in the field of natural language processing, as it provides a diverse and high-quality source of paraphrased articles that can be used for a range of NLP tasks.

Tables

Paraphrased Articles

@kaggle.aemreusta_paraphrased_articles_using_gpt3.paraphrased_articles
  • 338.3 KB
  • 69 rows
  • 7 columns
Loading...

CREATE TABLE paraphrased_articles (
  "title" VARCHAR,
  "abstract" VARCHAR,
  "introduction" VARCHAR,
  "paraphrasedtitle" VARCHAR,
  "paraphrasedabstract" VARCHAR,
  "paraphraseintroduction" VARCHAR,
  "url" VARCHAR
);

Share link

Anyone who has the link will be able to view this.