Context
This dataset contains information about an online electronic store. The store has three warehouses from which goods are delivered to customers.
Columns Description
- order_id: A unique id for each order
- customer_id: A unique id for each customer
- date: The date the order was made, given in YYYY-MM-DD format
- nearest_warehouse: A string denoting the name of the nearest warehouse to the
customer
- shopping_cart: A list of tuples representing the order items: the first element of
the tuple is the item ordered, and the second element is the
quantity ordered for such item.
- order_price: A float denoting the order price in USD. The order price is the
price of items before any discounts and/or delivery charges
are applied.
- delivery_charges: A float representing the delivery charges of the order
- customer_lat: Latitude of the customer’s location
- customer_long: Longitude of the customer’s location
- coupon_discount: An integer denoting the percentage discount to be applied to
the order_price.
- order_total: A float denoting the total of the order in USD after all
discounts and/or delivery charges are applied.
- season: A string denoting the season in which the order was placed.
- is_expedited_delivery: A boolean denoting whether the customer has requested an
expedited delivery
- distance_to_nearest_warehouse: A float representing the arc distance, in kilometres, between
the customer and the nearest warehouse to him/her.
- latest_customer_review: A string representing the latest customer review on his/her
most recent order
- is_happy_customer: A boolean denoting whether the customer is a happy
customer or had an issue with his/her last order.
Inspiration
Use this dataset to perform graphical and/or non-graphical EDA methods to understand
the data first and then find and fix the data problems.
- Detect and fix errors in dirty_data.csv
- Impute the missing values in missing_data.csv
- Detect and remove Anolamies
- To check whether a customer is happy with their last order
All the Best