LogoLogo
HomeDocumentationLoginTry for free
  • CARTO Academy
  • Working with geospatial data
    • Geospatial data: the basics
      • What is location data?
      • Types of location data
      • Changing between types of geographical support
    • Optimizing your data for spatial analysis
    • Introduction to Spatial Indexes
      • Spatial Index support in CARTO
      • Create or enrich an index
      • Work with unique Spatial Index properties
      • Scaling common geoprocessing tasks with Spatial Indexes
      • Using Spatial Indexes for analysis
        • Calculating traffic accident rates
        • Which cell phone towers serve the most people?
    • The modern geospatial analysis stack
      • Spatial data management and analytics with CARTO QGIS Plugin
      • Using data from a REST API for real-time updates
  • Building interactive maps
    • Introduction to CARTO Builder
    • Data sources & map layers
    • Widgets & SQL Parameters
    • AI Agents
    • Data visualization
      • Build a dashboard with styled point locations
      • Style qualitative data using hex color codes
      • Create an animated visualization with time series
      • Visualize administrative regions by defined zoom levels
      • Build a dashboard to understand historic weather events
      • Customize your visualization with tailored-made basemaps
      • Visualize static geometries with attributes varying over time
      • Mapping the precipitation impact of Hurricane Milton with raster data
    • Data analysis
      • Filtering multiple data sources simultaneously with SQL Parameters
      • Generate a dynamic index based on user-defined weighted variables
      • Create a dashboard with user-defined analysis using SQL Parameters
      • Analyzing multiple drive-time catchment areas dynamically
      • Extract insights from your maps with AI Agents
    • Sharing and collaborating
      • Dynamically control your maps using URL parameters
      • Embedding maps in BI platforms
    • Solving geospatial use-cases
      • Build a store performance monitoring dashboard for retail stores in the USA
      • Analyzing Airbnb ratings in Los Angeles
      • Assessing the damages of La Palma Volcano
    • CARTO Map Gallery
  • Creating workflows
    • Introduction to CARTO Workflows
    • Step-by-step tutorials
      • Creating a composite score for fire risk
      • Spatial Scoring: Measuring merchant attractiveness and performance
      • Using crime data & spatial analysis to assess home insurance risk
      • Identify the best billboards and stores for a multi-channel product launch campaign
      • Estimate the population covered by LTE cells
      • A no-code approach to optimizing OOH advertising locations
      • Optimizing site selection for EV charging stations
      • How to optimize location planning for wind turbines
      • Calculate population living around top retail locations
      • Identifying customers potentially affected by an active fire in California
      • Finding stores in areas with weather risks
      • How to run scalable routing analysis the easy way
      • Geomarketing techniques for targeting sportswear consumers
      • How to use GenAI to optimize your spatial analysis
      • Analyzing origin and destination patterns
      • Understanding accident hotspots
      • Real-Time Flood Claims Analysis
      • Train a classification model to estimate customer churn
      • Space-time anomaly detection for real-time portfolio management
      • Identify buildings in areas with a deficit of cell network antennas
    • Workflow templates
      • Data Preparation
      • Data Enrichment
      • Spatial Indexes
      • Spatial Analysis
      • Generating new spatial data
      • Statistics
      • Retail and CPG
      • Telco
      • Insurance
      • Out Of Home Advertising
      • BigQuery ML
      • Snowflake ML
  • Advanced spatial analytics
    • Introduction to the Analytics Toolbox
    • Spatial Analytics for BigQuery
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Spacetime hotspot classification: Understanding collision patterns
        • Time series clustering: Identifying areas with similar traffic accident patterns
        • Detecting space-time anomalous regions to improve real estate portfolio management (quick start)
        • Detecting space-time anomalous regions to improve real estate portfolio management
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Analyzing signal coverage with line-of-sight calculation and path loss estimation
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Find similar locations based on their trade areas
        • Calculating market penetration in CPG with merchant universe matching
        • Measuring merchant attractiveness and performance in CPG with spatial scores
        • Segmenting CPG merchants using trade areas characteristics
        • Store cannibalization: quantifying the effect of opening new stores on your existing network
        • Find Twin Areas of top-performing stores
        • Opening a new Pizza Hut location in Honolulu
        • An H3 grid of Starbucks locations and simple cannibalization analysis
        • Data enrichment using the Data Observatory
        • New police stations based on Chicago crime location clusters
        • Interpolating elevation along a road using kriging
        • Analyzing weather stations coverage using a Voronoi diagram
        • A NYC subway connection graph using Delaunay triangulation
        • Computing US airport connections and route interpolations
        • Identifying earthquake-prone areas in the state of California
        • Bikeshare stations within a San Francisco buffer
        • Census areas in the UK within tiles of multiple resolutions
        • Creating simple tilesets
        • Creating spatial index tilesets
        • Creating aggregation tilesets
        • Using raster and vector data to calculate total rooftop PV potential in the US
        • Using the routing module
      • About Analytics Toolbox regions
    • Spatial Analytics for Snowflake
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Opening a new Pizza Hut location in Honolulu
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
        • A Quadkey grid of stores locations and simple cannibalization analysis
        • Minkowski distance to perform cannibalization analysis
        • Computing US airport connections and route interpolations
        • New supplier offices based on store locations clusters
        • Analyzing store location coverage using a Voronoi diagram
        • Enrichment of catchment areas for store characterization
        • Data enrichment using the Data Observatory
    • Spatial Analytics for Redshift
      • Step-by-step tutorials
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
Powered by GitBook
On this page
  • You'll need...
  • Data preparation
  • Twin Areas Analysis
  • Create the Twin Areas model
  • Find twin areas

Was this helpful?

Export as PDF
  1. Advanced spatial analytics
  2. Spatial Analytics for BigQuery
  3. Step-by-step tutorials

Find Twin Areas of top-performing stores

Last updated 2 months ago

Was this helpful?

consists in three main steps:

  • Select relevant variables given the characteristics of your business (e.g. population, income, etc.), coming from either our or from your own data tables;

  • Gridify and enrich the location of an existing site (from now on referred to as the origin location) and of all the locations that we'd like to compare (from now on referred to as the targte locations) using the selected data sources. The process of gridification both for the origin and target locations, which is required in order to be able to compare areas of the same size, relies on the use of spatial indexes (either quadbin or h3) as constructed using the available procedures in the Analytics Toolbox.

  • Derive a similarity skill score between the origin and each target locations by ranking the distance between the origin and each target cell in the variable space (where the selected variables are first transformed using their Principal Component scores to account for pairwise correlations) with respect to the score of the average cell in the target areas.

You'll need...

To follow this tutorial, you'll need:

  • A subscription to the Sociodemographics, 2014, 5yrs - United States of America (Census Block Group) [20102014] table via the CARTO Data Observatory.

All other data used here is available as public data.

Data preparation

We begin by preparing data for the original locations, specifically, those where we already have stores performing well. These locations will serve as references for identifying similar areas later on.

  • "ac_xxxxxx" in Data Observatory enrichment with your unique CARTO connection ID. You can find this in the Data Explorer > Data Observatory > select Access in... for any subscription and choose your connection - your unique code can be found here.

  • "yourproject.yourdataset" with a location to save the results to.

CALL `carto-un`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    -- Output table
    'yourproject.yourdataset.twin_areas_origin_enriched',
    R'''
    {
        "kring": 1,
        "decay": "uniform"
    }
    '''
);
CALL `carto-un-eu`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    -- Output table
    'yourproject.yourdataset.twin_areas_origin_enriched',
    R'''
    {
        "kring": 1,
        "decay": "uniform"
    }
    '''
);
CALL carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    -- Output table
    'yourproject.yourdataset.twin_areas_origin_enriched',
    R'''
    {
        "kring": 1,
        "decay": "uniform"
    }
    '''
);

This map shows both the locations of the selected stores (above) as well as the enriched grid for the population variable (below)

Next, we can use this same procedure to gridify and enrich the target areas for which we will use a the Census Tracts polygons in Texas in the main urban areas.

CALL `carto-un`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT geom FROM `cartobq.docs.twin_areas_target`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'yourproject.yourdataset.twin_areas_target_enriched');
CALL `carto-un-eu`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT geom FROM `cartobq.docs.twin_areas_target`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'yourproject.yourdataset.twin_areas_target_enriched');
CALL carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT geom FROM `cartobq.docs.twin_areas_target`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'yourproject.yourdataset.twin_areas_target_enriched');

The resulting grid is shown the map below.

Twin Areas Analysis

Create the Twin Areas model

CALL `carto-un`.carto.BUILD_TWIN_AREAS_MODEL
(
    'cartobq.docs.twin_areas_origin_enriched',
    'cartobq.docs.twin_areas_target_enriched',
    'quadbin',
    'project.dataset.twin_areas_analysis_01',
    '''{
        "model_options":{
            "PCA_EXPLAINED_VARIANCE_RATIO":0.9
        }
    }'''
);
CALL `carto-un-eu`.carto.BUILD_TWIN_AREAS_MODEL
(
    'cartobq.docs.twin_areas_origin_enriched',
    'cartobq.docs.twin_areas_target_enriched',
    'quadbin',
    'project.dataset.twin_areas_analysis_01',
    '''{
        "model_options":{
            "PCA_EXPLAINED_VARIANCE_RATIO":0.9
        }
    }'''
);
CALL carto.BUILD_TWIN_AREAS_MODEL
(
    'cartobq.docs.twin_areas_origin_enriched',
    'cartobq.docs.twin_areas_target_enriched',
    'quadbin',
    'project.dataset.twin_areas_analysis_01',
    '''{
        "model_options":{
            "PCA_EXPLAINED_VARIANCE_RATIO":0.9
        }
    }'''
);

Find twin areas

CALL `carto-un`.carto.FIND_TWIN_AREAS
(
    -- Twin areas model
    'project.dataset.twin_areas_analysis_01_model',
    -- Index column name
    'quadbin',
    -- Output table
    'project.dataset.twin_areas_analysis_01_results',
    -- Options
    '''{
        "origin_index":"5256404125934944255"
    }'''
);
CALL `carto-un-eu`.carto.FIND_TWIN_AREAS
(
    -- Twinareas model
    'project.dataset.twin_areas_analysis_01_model',
    -- Index column name
    'quadbin',
    -- Output table
    'project.dataset.twin_areas_analysis_01_results',
    -- Options
    '''{
        "origin_index":"5256404125934944255"
    }'''
);
CALL carto.FIND_TWIN_AREAS
(
    -- Twinareas model
    'project.dataset.twin_areas_analysis_01_model',
    -- Index column name
    'quadbin',
    -- Output table
    'project.dataset.twin_areas_analysis_01_results',
    -- Options
    '''{
        "origin_index":"5256404125934944255"
    }'''
);

This map shows the similarity skill score for all the target cells with a positive score: larger scores indicate areas more similar to the origin location.

Traditionally, discovering new areas for businesses represented a difficult and lengthy process, which required on-site market analysis and local expertise. Using instead our Twin Areas tool, retailers and companies in CPG can now easily discover the best locations to expand or optimize their network without a strong prior knowledge of the area and optimize their site planning process by taking advantage of our comprehensive data catalog and the analytical capability of CARTO’s cloud-native platform.

We'll use the procedure from the data module in CARTO’s Analytics Toolbox to prepare the data of our analysis. This procedure is used to first gridify a set of geometries (point data in this case) to a quadkey grid with zoom 15, and then to enrich grid cell with data from a subscription to one of the datasets available in the Data Observatory, including the total population (total_pop_3409f36f) and the number of households (households_d7d24db5) at the Census Block Group level from the Sociodemographics dataset, as well as from a custom dataset, which contains the count of road links (count_qualified) per zip code.

Execute the following code to achieve this. Please note that call procedures should be run either from your BigQuery console, or from a Call Procedure component in . When running the below code, you will need to replace:

We use the procedure to create the twin areas model. For both the origin and the target cells, this procedure transforms the input data by standardizing the numerical variables and creating a standardized indicator matrix for the categorical variables and then it creates a model using the processed target data as input.

Once we gridified and enriched the origin and target areas, we can then run the procedure for a given origin location, here selected as the store with the highest revenue:

Check out this for more information on the application of the Twin Areas analysis to this use case.

This project has received funding from the research and innovation programme under grant agreement No 960401.

GRIDIFY_ENRICH
ACS
CARTO Workflows
BUILD_TWIN_AREAS_MODEL
Principal Component Analysis (PCA)
FIND_TWIN_AREAS
blogpost
The Twin Areas analysis
Data Observatory (DO)
European Union’s Horizon 2020
Advanced difficulty banner
EU flag