Find Twin Areas of top-performing stores
Last updated
Was this helpful?
Last updated
Was this helpful?
The Twin Areas analysis consists in three main steps:
Select relevant variables given the characteristics of your business (e.g. population, income, etc.), coming from either our Data Observatory (DO) or from your own data tables;
Gridify and enrich the location of an existing site (from now on referred to as the origin location) and of all the locations that we'd like to compare (from now on referred to as the targte locations) using the selected data sources. The process of gridification both for the origin and target locations, which is required in order to be able to compare areas of the same size, relies on the use of spatial indexes (either quadbin or h3) as constructed using the available procedures in the Analytics Toolbox.
Derive a similarity skill score between the origin and each target locations by ranking the distance between the origin and each target cell in the variable space (where the selected variables are first transformed using their Principal Component scores to account for pairwise correlations) with respect to the score of the average cell in the target areas.
To follow this tutorial, you'll need:
A subscription to the Sociodemographics, 2014, 5yrs - United States of America (Census Block Group) [20102014] table via the CARTO Data Observatory.
All other data used here is available as public data.
We'll use the GRIDIFY_ENRICH procedure from the data module in CARTO’s Analytics Toolbox to prepare the data of our analysis. This procedure is used to first gridify a set of geometries (point data in this case) to a quadkey grid with zoom 15, and then to enrich grid cell with data from a subscription to one of the datasets available in the Data Observatory, including the total population (total_pop_3409f36f)
and the number of households (households_d7d24db5
) at the Census Block Group level from the ACS Sociodemographics dataset, as well as from a custom dataset, which contains the count of road links (count_qualified
) per zip code.
We begin by preparing data for the original locations, specifically, those where we already have stores performing well. These locations will serve as references for identifying similar areas later on.
Execute the following code to achieve this. Please note that call procedures should be run either from your BigQuery console, or from a Call Procedure component in CARTO Workflows. When running the below code, you will need to replace:
"ac_xxxxxx" in Data Observatory enrichment with your unique CARTO connection ID. You can find this in the Data Explorer > Data Observatory > select Access in... for any subscription and choose your connection - your unique code can be found here.
"yourproject.yourdataset" with a location to save the results to.
This map shows both the locations of the selected stores (above) as well as the enriched grid for the population variable (below)
Next, we can use this same procedure to gridify and enrich the target areas for which we will use a the Census Tracts polygons in Texas in the main urban areas.
The resulting grid is shown the map below.
We use the BUILD_TWIN_AREAS_MODEL procedure to create the twin areas model. For both the origin and the target cells, this procedure transforms the input data by standardizing the numerical variables and creating a standardized indicator matrix for the categorical variables and then it creates a Principal Component Analysis (PCA) model using the processed target data as input.
Once we gridified and enriched the origin and target areas, we can then run the FIND_TWIN_AREAS procedure for a given origin location, here selected as the store with the highest revenue:
This map shows the similarity skill score for all the target cells with a positive score: larger scores indicate areas more similar to the origin location.
Traditionally, discovering new areas for businesses represented a difficult and lengthy process, which required on-site market analysis and local expertise. Using instead our Twin Areas tool, retailers and companies in CPG can now easily discover the best locations to expand or optimize their network without a strong prior knowledge of the area and optimize their site planning process by taking advantage of our comprehensive data catalog and the analytical capability of CARTO’s cloud-native platform.
Check out this blogpost for more information on the application of the Twin Areas analysis to this use case.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.