Find similar locations based on their trade areas
Last updated
Last updated
In the retail and CPG industries, it is common to find the need to understand a set of candidate locations when making different supply and stock decisions. In this example, we follow the steps that one can follow using CARTO and the Analytics Toolbox to rank a set of locations based on the demographic similarity to a chosen location.
These are the main steps to follow, starting with a set of locations:
Define their trade areas.
Enrich such trade areas using demographic data from the Data Observatory.
Run the analysis of similar locations and visualize it on a map.
In this example, we will use a small subset of the locations available in the Iowa Liquor Sales dataset, which is publicly available. For this example, we will keep only stores in Des Moines that were active during 2021.
We can visualize this sample in the following map:
Our sample has a column named store_number
that uniquely identifies each of the locations. This column is relevant because it is a requirement for the FIND_SIMILAR_LOCATIONS
function. We also filter those whose geographical location is known because we will use that location for the next step (generating the trade areas). Bear in mind that the Analytics Toolbox provides functions like GEOCODE_TABLE
to infer the geography from an address, like in this example.
In this step, we will define each location’s trade area. We can understand these trade areas as the zones influenced by each of the stores. The Analytics Toolbox also provides a handy function to achieve this, GENERATE_TRADE_AREAS
:
Running this procedure will generate the table <your-project>.<your-dataset>.stores_trade_areas
, which will map each store_id
to a 500m-radius circular buffer.
This is the simplest way to generate a trade area; a more complex example of this function can be found in this example, which showcases how to generate isoline-based trade areas. Remember that the enrichment functions simply require a polygon-based GEOGRAPHY
column; any other custom geometry can also be used as trade area.
Now that we already have a defined set of trade areas per location, we can use external data available to enrich such areas. For this example, we will be fetching some basic population variables segmented by age and gender from the American Community Survey data.
It is also possible to enrich the trade areas using variables straight from the Data Observatory, as long as you have an active subscription to them. To achieve it, we can use DATAOBS_SUBSCRIPTIONS
, DATAOBS_VARIABLES
, and DATAOBS_ENRICH_POLYGONS
functions in the Analytics Toolbox as per this guide.
Now that each trade area is enriched, let’s run the similarity analysis. To do so, we need to choose the following:
An origin location, that will be taken as a reference to measure similarity.
A set of target locations, that will be analyzed to check how similar each of them is to the origin location.
Since both our origin and target locations come from the same source, let us save it as a table in BigQuery:
In this convenience table, we have store_number
serving as unique ID and all the feature columns we have previously computed.
As we said before, in this example, both origin and target locations come from the same source, but that is not a requirement: origin and target locations can come from different places as long as they can be enriched with the same variables in a comparable scale.
For this example, we are going to take as reference store #2628.
This procedure will create the table <your-project>.<your-dataset>.similar_locations_2682_results
, where we can find the similarity_skill_score
column that we need for our analysis. Let us display these values on a map to check the results.
The first thing we can notice is how the map contains fewer locations than before: the similar locations procedure only returns those stores that are more similar than the average. Out of those, we can check the individual similarity using the column similarity_score
(which we can think of as a “distance” to the original location, the lower the better) or similarity_skill_score
(a normalized version that we can think of as a similarity measure, the higher the better).
Using this similarity_skill_score
, we can see how the nearby stores get a very high level of similarity, since our trade areas were solely based in the vicinity of each location. However, we can see how different patterns emerge as well in other parts of the city, were similar locations are found as well.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.