Find Twin Areas of top-performing stores

The Twin Areas analysis consists in three main steps:

Select relevant variables given the characteristics of your business (e.g. population, income, etc.), coming from either our Data Observatory (DO) or from your own data tables;
Gridify and enrich the location of an existing site (from now on referred to as the origin location) and of all the locations that we'd like to compare (from now on referred to as the targte locations) using the selected data sources. The process of gridification both for the origin and target locations, which is required in order to be able to compare areas of the same size, relies on the use of spatial indexes (either quadbin or h3) as constructed using the available procedures in the Analytics Toolbox.
Derive a similarity skill score between the origin and each target locations by ranking the distance between the origin and each target cell in the variable space (where the selected variables are first transformed using their Principal Component scores to account for pairwise correlations) with respect to the score of the average cell in the target areas.

Data preparation

We'll use the GRIDIFY_ENRICH procedure from the data module in CARTO’s Analytics Toolbox to prepare the data of our analysis. This procedure is used to first gridify a set of geometries (point data in this case) to a quadkey grid with zoom 15, and then to enrich grid cell with data from a subscription to one of the datasets available in the Data Observatory, including the total population (total_pop_3409f36f) and the number of households (households_d7d24db5) at the Census Block Group level from the ACS Sociodemographics dataset, as well as from a custom dataset, which contains the count of road links (count_qualified) per zip code.

We begin by preparing data for the original locations, specifically, those where we already have stores performing well. These locations will serve as references for identifying similar areas later on.

CALL `carto-un`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_7xhfwyml',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'cartobq.docs.twin_areas_origin_enriched');

CALL `carto-un-eu`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_7xhfwyml',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'cartobq.docs.twin_areas_origin_enriched');

CALL carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_7xhfwyml',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'cartobq.docs.twin_areas_origin_enriched');

This map shows both the locations of the selected stores (above) as well as the enriched grid for the population variable (below)

Next, we can use this same procedure to gridify and enrich the target areas for which we will use a the Census Tracts polygons in Texas in the main urban areas.

CALL `carto-un`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT geom FROM `cartobq.docs.twin_areas_target`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_7xhfwyml',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'cartobq.docs.twin_areas_target_enriched');

CALL `carto-un-eu`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT geom FROM `cartobq.docs.twin_areas_target`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_7xhfwyml',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'cartobq.docs.twin_areas_target_enriched');

CALL carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT geom FROM `cartobq.docs.twin_areas_target`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_7xhfwyml',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'cartobq.docs.twin_areas_target_enriched');

The resulting grid is shown the map below.

Twin Areas Analysis

Create the Twin Areas model

We use the BUILD_TWIN_AREAS_MODEL procedure to create the twin areas model. For both the origin and the target cells, this procedure transforms the input data by standardizing the numerical variables and creating a standardized indicator matrix for the categorical variables and then it creates a Principal Component Analysis (PCA) model using the processed target data as input.

CALL `carto-un`.carto.BUILD_TWIN_AREAS_MODEL
(
    'cartobq.docs.twin_areas_origin_enriched',
    'cartobq.docs.twin_areas_target_enriched',
    'quadbin',
    'project.dataset.twin_areas_analysis_01',
    '''{
        "model_options":{
            "PCA_EXPLAINED_VARIANCE_RATIO":0.9
        }
    }'''
);

CALL `carto-un-eu`.carto.BUILD_TWIN_AREAS_MODEL
(
    'cartobq.docs.twin_areas_origin_enriched',
    'cartobq.docs.twin_areas_target_enriched',
    'quadbin',
    'project.dataset.twin_areas_analysis_01',
    '''{
        "model_options":{
            "PCA_EXPLAINED_VARIANCE_RATIO":0.9
        }
    }'''
);

CALL carto.BUILD_TWIN_AREAS_MODEL
(
    'cartobq.docs.twin_areas_origin_enriched',
    'cartobq.docs.twin_areas_target_enriched',
    'quadbin',
    'project.dataset.twin_areas_analysis_01',
    '''{
        "model_options":{
            "PCA_EXPLAINED_VARIANCE_RATIO":0.9
        }
    }'''
);

Find twin areas

Once we gridified and enriched the origin and target areas, we can then run the FIND_TWIN_AREAS procedure for a given origin location, here selected as the store with the highest revenue:

CALL `carto-un`.carto.FIND_TWIN_AREAS
(
    -- Twin areas model
    'project.dataset.twin_areas_analysis_01_model',
    -- Index column name
    'quadbin',
    -- Output table
    'project.dataset.twin_areas_analysis_01_results',
    -- Options
    '''{
        "origin_index":"5256404125934944255"
    }'''
);

CALL `carto-un-eu`.carto.FIND_TWIN_AREAS
(
    -- Twinareas model
    'project.dataset.twin_areas_analysis_01_model',
    -- Index column name
    'quadbin',
    -- Output table
    'project.dataset.twin_areas_analysis_01_results',
    -- Options
    '''{
        "origin_index":"5256404125934944255"
    }'''
);

CALL carto.FIND_TWIN_AREAS
(
    -- Twinareas model
    'project.dataset.twin_areas_analysis_01_model',
    -- Index column name
    'quadbin',
    -- Output table
    'project.dataset.twin_areas_analysis_01_results',
    -- Options
    '''{
        "origin_index":"5256404125934944255"
    }'''
);

This map shows the similarity skill score for all the target cells with a positive score: larger scores indicate areas more similar to the origin location.

Traditionally, discovering new areas for businesses represented a difficult and lengthy process, which required on-site market analysis and local expertise. Using instead our Twin Areas tool, retailers and companies in CPG can now easily discover the best locations to expand or optimize their network without a strong prior knowledge of the area and optimize their site planning process by taking advantage of our comprehensive data catalog and the analytical capability of CARTO’s cloud-native platform.

Check out this blogpost for more information on the application of the Twin Areas analysis to this use case.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.

Last updated 4 months ago