Applying GWR to understand Airbnb listings prices

Geographically Weighted Regression (GWR) is a statistical regression method that models the local (e.g. regional or sub-regional) relationships between a set of predictor variables and an outcome of interest. Therefore, it should be used in lieu of a global model in those scenarios where these relationships vary spatially.

In this example we are going to analyze the local relationships between Airbnb’s listings in Berlin and the number of bedrooms and bathrooms available at these listings using the GWR_GRID procedure. Our input dataset, publicly available from cartobq.docs.airbnb_berlin_h3_qk, contains the Airbnb listing’s locations, their prices, and their number of bedrooms and bathrooms. Each Airbnb location has H3 and quadkey cells at different resolutions to allow users to test and compare different models.

We can run our GWR analysis by simply running this query:

CALL `carto-un`.carto.GWR_GRID(
    'cartobq.docs.airbnb_berlin_h3_qk',
    ['bedrooms', 'bathrooms'], -- [ beds feature, bathrooms feature ]
    'price', -- price (target variable)
    'h3_z7', 'h3', 3, 'gaussian', TRUE,
    NULL
);

This particular configuration will run a local regression for each H3 grid cell at resolution 7. All listings at each particular grid cell and those within its neighborhood, defined as its Kring of size 3 will be taken into account to run this regression. Data points within the neighborhood will be given a weight inversely proportional to the distance to the central cell, according to the kernel function of choice, in this case, a Gaussian.

The output of our GWR analysis is a table that contains the result of each of these regressions: the coefficients for each of the predictor variables and the intercept. The following map shows the coefficients associated with the number of bedrooms (left) and bathroom (right), where darker/brighter areas correspond to lower/higher values:

Positive values indicate a positive association between the Airbnb’s listing prices and the presence of bedrooms and bathrooms (conditional on the other) and with larger absolute values indicating a stronger association.

We can see that overall, where listings are equipped with more bedrooms and bathrooms, their price is also higher. However, the strength of this association is weaker in some areas: for instance, the number of bedrooms clearly drives higher prices in the city center, while not as much in the outskirts of the city.

Understanding the data

Rather than performing a regression per data point in our Airbnb listings dataset, a regression per cell is computed to improve computation time and efficiency. The procedure computes the coefficients for each cell of interest based on the Airbnb locations that lie within such cell and its neighbors. Notice that the data of the neighboring cells will be assigned a lower weight the further they are from the origin cell. Please refer to the documentation for a more detailed explanation.

To better illustrate how the GWR works, we have prepared another map that shows how data is used to run every regression within the algorithm following the example above. You can select a specific H3 cell (in red) to visualize which Airbnb locations (in bright white) have been used to run the regression that estimates the corresponding bedroom and bathroom coefficients. The larger the size of each white dot, the greater the weight the Airbnb site has within the regression model.

Check out this blogpost for examples on the application of GWR to explore the spatially-varying relationships between crime occurrence and the number of unemployed population and average house prices.

Last updated