Overview
This dataset contains information from 10,052 bicycle advertisements -- in June 2020 -- from two websites:
- Bike Exchange: 1,982 ads
- Ebay: 8,070 ads divided between two countries
- United States: 4,136 ads
- United Kingdom: 3,934 ads
For both sites, we filtered by "Road Bikes" and "Mountain Bikes" only. We collected the ad title, main image, asking price, condition, and all specifications listed (such as brand, color, frame size, etc.). For Ebay, ads were collected from the US and UK domains; you can verify which are listed in the UK by looking for "ebay.co.uk" in the "Product URL" field within data_ebay.json
.
The ads & images on BikeExchange are higher quality & higher resolution than the images on Ebay. For Ebay, often non-bike ads (i.e. bike frames, bike accessories, bike artwork) would appear in the data so we ran the images through ResNet50 and only kept images which were classified as a bike with high confidence. However, some non-bike ads still may be present.
Uses
My primary objective was to predict the asking price from the main image (this can be done using the extracted price data in combined_price-only.csv
and images in images/
). It may be interesting & useful to also use the specification fields listed in data_bike_exchange.json
and data_ebay.json
; however, this will require extra preprocessing since the fields differ within each site and also between the two sites. An EDA on bike ads in general would be an interesting project; potentially comparing the two sites.