Georefrenced Spain Housing
Detailed information on around 180,000 houses in Madrid, Barcelona and Valencia
@kaggle.tomstravis_georefrenced_spain_housing
Detailed information on around 180,000 houses in Madrid, Barcelona and Valencia
@kaggle.tomstravis_georefrenced_spain_housing
The Idealista dataset from 2018 provides detailed, geo-referenced information on real estate listings across Madrid, Barcelona, and Valencia. The data includes property characteristics, asking prices, and spatial information, such as proximity to key urban features for more than 180,000 dwellings which have been anonymised for privacy purposes. This dataset is available as an R package (idealista18) but comes in sf object data type. This dataset is turned into a csv maintaining all the features with almost no data loss, except the geometric point column which was dropped. However, all the geographical features have been maintained and is useful for urban studies and real estate analysis.
All credit goes to Antonio Paez who published the data, project methods and conclusions on his GitHub repo available on the following link: https://github.com/paezha/idealista18. Information relevant to the project including the original article can be found on the following websites: https://www.idealista.com/labs/blog/?p=4207 and https://journals.sagepub.com/doi/epub/10.1177/23998083241242844.
ASSETID:
A unique identifier for each property or asset in the dataset. It's a categorical variable with multiple levels.
PERIOD:
The time period in which the data was recorded, typically represented as a six-digit integer (YYYYMM) indicating the year and month.
PRICE:
The total price of the property in the local currency (€). This is a numerical value.
UNITPRICE:
The price per unit area (square meter) of the property. This is also a numerical value.
CONSTRUCTEDAREA:
The total constructed area of the property in square meters. It's an integer value indicating the size of the property.
ROOMNUMBER:
The number of rooms in the property. This integer value includes all types of rooms (e.g., bedrooms, living rooms).
BATHNUMBER:
The number of bathrooms in the property. This is an integer value.
HASTERRACE:
A binary indicator (0/1) where 1 indicates the property has a terrace and 0 indicates it does not.
HASLIFT:
A binary indicator (0/1) where 1 indicates the property has a lift (elevator) and 0 indicates it does not.
HASAIRCONDITIONING:
A binary indicator (0/1) where 1 indicates the property has air conditioning and 0 indicates it does not.
AMENITYID:
An integer that likely represents a category or type of amenities associated with the property. The exact categories would need to be referenced from additional documentation which I could not find after a comprehensive search. If you have any knowledge of what the categories could be please leave a comment in order to update and clarify this.
HASPARKINGSPACE:
A binary indicator (0/1) where 1 indicates the property includes a parking space and 0 indicates it does not.
ISPARKINGSPACEINCLUDEDINPRICE:
A binary indicator (0/1) where 1 indicates the parking space is included in the property price, and 0 indicates it is not.
PARKINGSPACEPRICE:
An integer that appears to be a placeholder or a standardized value (likely 1 in this dataset). It could represent the price of the parking space, but more context is needed.
HASNORTHORIENTATION:
A binary indicator (0/1) where 1 indicates the property has a northern orientation and 0 indicates it does not.
HASSOUTHORIENTATION:
A binary indicator (0/1) where 1 indicates the property has a southern orientation and 0 indicates it does not.
HASEASTORIENTATION:
A binary indicator (0/1) where 1 indicates the property has an eastern orientation and 0 indicates it does not.
HASWESTORIENTATION:
A binary indicator (0/1) where 1 indicates the property has a western orientation and 0 indicates it does not.
HASBOXROOM:
A binary indicator (0/1) where 1 indicates the property includes a box room (a small room typically used for storage), and 0 indicates it does not.
HASWARDROBE:
A binary indicator (0/1) where 1 indicates the property includes a built-in wardrobe, and 0 indicates it does not.
HASSWIMMINGPOOL:
A binary indicator (0/1) where 1 indicates the property includes a swimming pool, and 0 indicates it does not.
HASDOORMAN:
A binary indicator (0/1) where 1 indicates the property has a doorman or concierge service, and 0 indicates it does not.
HASGARDEN:
A binary indicator (0/1) where 1 indicates the property has a garden, and 0 indicates it does not.
ISDUPLEX:
A binary indicator (0/1) where 1 indicates the property is a duplex (a two-story unit), and 0 indicates it is not.
ISSTUDIO:
A binary indicator (0/1) where 1 indicates the property is a studio (a single-room apartment with combined living and sleeping areas), and 0 indicates it is not.
ISINTOPFLOOR:
A binary indicator (0/1) where 1 indicates the property is located on the top floor of the building, and 0 indicates it is not.
CONSTRUCTIONYEAR:
The year the property was constructed. This is an integer value, though some entries may be missing (NA).
FLOORCLEAN:
An integer likely representing the floor level of the property, though the exact meaning might require further context.
FLATLOCATIONID:
An integer that likely corresponds to a specific location or region code for the flat, but would need additional reference documentation to decode.
CADCONSTRUCTIONYEAR:
The construction year of the property according to cadastral (land registration) records. This is an integer value.
CADMAXBUILDINGFLOOR:
The maximum number of floors in the building according to cadastral records. This is an integer value.
CADDWELLINGCOUNT:
The number of dwelling units in the building according to cadastral records. This is an integer value.
CADASTRALQUALITYID:
An integer representing the cadastral quality or classification of the building, likely based on legal or governmental standards.
BUILTTYPEID_1:
A binary indicator (0/1) where 1 indicates the property is new construction (no previous owners), and 0 indicates it is not.
BUILTTYPEID_2:
A binary indicator (0/1) where 1 indicates the property is second hand to be restored, and 0 indicates it is not.
BUILTTYPEID_3:
A binary indicator (0/1) where 1 indicates the property is second hand in good condition, and 0 indicates it is not.
DISTANCE_TO_CITY_CENTER:
A numerical value representing the distance (kilometres) from the property to the city center.
DISTANCE_TO_METRO:
A numerical value representing the distance (kilometres) from the property to the nearest metro station.
DISTANCE_TO_{main avenue}:
A numerical value representing the distance (kilometres) from the property to each city's main avenue, in Madrid it is Paseo de la Castellana or Castellana, in Barcelona it is Avenida Diagonal or Diagonal, and in Valencia it is Avenida de Blasco Ibáñez or Blasco.
LONGITUDE:
The geographical longitude of the property’s location. This is a numerical value indicating the east-west position.
LATITUDE:
The geographical latitude of the property’s location. This is a numerical value indicating the north-south position.
This data was initially featured in the following paper:
Rey-Blanco, D., Arbues, P., Lopez, F., & Paez, A. (2024). A geo-referenced micro-data set of real estate listings for Spain’s three largest cities. Environment and Planning B: Urban Analytics and City Science, 51(6), 1369-1379. https://doi.org/10.1177/23998083241242844
I encountered it on a post on LinkedIn by idealista following to their website: https://www.idealista.com/labs/blog/?p=4207
This dataset is a modified version of Antonio Paez's dataset available on the GitHub repo: https://github.com/paezha/idealista18/tree/master
Anyone who has the link will be able to view this.