Indian Hotels On Goibibo
4,000 Indian hotels on Goibibo
@kaggle.promptcloudhq_hotels_on_goibibo
4,000 Indian hotels on Goibibo
@kaggle.promptcloudhq_hotels_on_goibibo
This is a pre-crawled dataset, taken as subset of a bigger dataset (more than 33344 hotels) that was created by extracting data from goibibo.com, a leading travel site from India.
This dataset has following fields:
address
area
- The sub-city region that this hotel is located in, geographically.city
country
- Always India
.crawl_date
guest_recommendation
- How many guests that stayed here have recommended this hotels to others on the site.hotel_brand
- The chain that owns this hotel, if this hotel is part of a chain.hotel_category
hotel_description
- A hotel description, as provided by the lister.hotel_facilities
-hotel_star_rating
- The out-of-five star rating of this hotel.image_count
- The number of images provided with the listing.latitude
locality
longitude
pageurl
point_of_interest
- Nearby locations of interest.property_name
property_type
- The type of property. Usually a hotel.province
qts
- Crawl timestamp.query_time_stamp
- Copy of qts
.review_count_by_category
- Reviews for the hotel, broken across several different categories.room_area
room_count
room_facilities
room_type
similar_hotel
site_review_count
- The number of reviews for this hotel left on the site by users.site_review_rating
- The overall rating for this hotel by users.site_stay_review_rating
sitename
- Always goibibo.com
state
uniq_id
This dataset was created by PromptCloud's in-house web-crawling service.
Try exploring some of the amenity categories. What do you see?
Try applying some natural language processing algorithms to the hotel descriptions. What are the some common words and phrases? How do they relate to the amenities the hotel offers?
What can you discover by drilling down further into hotels in different regions?
CREATE TABLE goibibo_com_travel_sample (
"additional_info" VARCHAR,
"address" VARCHAR,
"area" VARCHAR,
"city" VARCHAR,
"country" VARCHAR,
"crawl_date" TIMESTAMP,
"guest_recommendation" DOUBLE,
"hotel_brand" VARCHAR,
"hotel_category" VARCHAR,
"hotel_description" VARCHAR,
"hotel_facilities" VARCHAR,
"hotel_star_rating" BIGINT,
"image_count" BIGINT,
"latitude" DOUBLE,
"locality" VARCHAR,
"longitude" DOUBLE,
"pageurl" VARCHAR,
"point_of_interest" VARCHAR,
"property_id" VARCHAR,
"property_name" VARCHAR,
"property_type" VARCHAR,
"province" VARCHAR,
"qts" VARCHAR,
"query_time_stamp" VARCHAR,
"review_count_by_category" VARCHAR,
"room_area" VARCHAR,
"room_count" BIGINT,
"room_facilities" VARCHAR,
"room_type" VARCHAR,
"similar_hotel" VARCHAR,
"site_review_count" DOUBLE,
"site_review_rating" DOUBLE,
"site_stay_review_rating" VARCHAR,
"sitename" VARCHAR,
"state" VARCHAR,
"uniq_id" VARCHAR
);
Anyone who has the link will be able to view this.