Context
This is a pre-crawled dataset, taken as subset of a bigger dataset (more than 33344 hotels) that was created by extracting data from goibibo.com, a leading travel site from India.
Content
This dataset has following fields:
address
area
- The sub-city region that this hotel is located in, geographically.
city
country
- Always India
.
crawl_date
guest_recommendation
- How many guests that stayed here have recommended this hotels to others on the site.
hotel_brand
- The chain that owns this hotel, if this hotel is part of a chain.
hotel_category
hotel_description
- A hotel description, as provided by the lister.
hotel_facilities
-
hotel_star_rating
- The out-of-five star rating of this hotel.
image_count
- The number of images provided with the listing.
latitude
locality
longitude
pageurl
point_of_interest
- Nearby locations of interest.
property_name
property_type
- The type of property. Usually a hotel.
province
qts
- Crawl timestamp.
query_time_stamp
- Copy of qts
.
review_count_by_category
- Reviews for the hotel, broken across several different categories.
room_area
room_count
room_facilities
room_type
similar_hotel
site_review_count
- The number of reviews for this hotel left on the site by users.
site_review_rating
- The overall rating for this hotel by users.
site_stay_review_rating
sitename
- Always goibibo.com
state
uniq_id
Acknowledgements
This dataset was created by PromptCloud's in-house web-crawling service.
Inspiration
-
Try exploring some of the amenity categories. What do you see?
-
Try applying some natural language processing algorithms to the hotel descriptions. What are the some common words and phrases? How do they relate to the amenities the hotel offers?
-
What can you discover by drilling down further into hotels in different regions?