Context
The recipes dataset contains 522,517 recipes from 312 different categories. This dataset provides information about each recipe like cooking times, servings, ingredients, nutrition, instructions, and more.
The reviews dataset contains 1,401,982 reviews from 271,907 different users. This dataset provides information about the author, rating, review text, and more.
Content
The recipes dataset is provided in two different formats:
recipes.parquet
and reviews.parquet
are recommended as they preserve the schema of the original data.
recipes.csv
is designed to be parsed in R while reviews.csv
does not contain any list-columns so it can be easily parsed.
Parsing
To read recipes.csv
and parse the list-column values (Images, Keywords, RecipeIngredientQuantities, RecipeIngredientParts, RecipeInstructions) in R:
library(readr)
recipes <- read_csv("recipes.csv")
print(recipes$Images[3])
## "c(\"https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/40/picJ4Sz3N.jpg\", \"https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/40/pic23FWio.jpg\")"
print(eval(parse(text = recipes$Images[3])))
## "https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/40/picJ4Sz3N.jpg"
## "https://img.sndimg.com/food/image/upload/w_555,h_416,c_fit,fl_progressive,q_95/v1/img/recipes/40/pic23FWio.jpg"
To parse ISO 8601 duration format values (CookTime, PrepTime, and TotalTime) in R:
library(lubridate)
duration("PT24H45M")
## "89100s (~1.03 days)"