> For the complete documentation index, see [llms.txt](https://academy.carto.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/find-twin-areas-of-top-performing-stores.md).

# Find Twin Areas of top-performing stores

<div align="left"><figure><img src="/files/YUiOye9yS8uvdGFnFppD" alt="Advanced difficulty banner" width="175"><figcaption></figcaption></figure></div>

[The Twin Areas analysis](https://carto.com/blog/spatial-data-science-site-planning/) consists in three main steps:

* Select relevant variables given the characteristics of your business (e.g. population, income, etc.), coming from either our [Data Observatory (DO)](https://carto.com/spatial-data-catalog/) or from your own data tables;
* Gridify and enrich the location of an existing site (from now on referred to as the origin location) and of all the locations that we'd like to compare (from now on referred to as the targte locations) using the selected data sources. The process of gridification both for the origin and target locations, which is required in order to be able to compare areas of the same size, relies on the use of [spatial indexes](broken://pages/j3coG8GyXiE3bveTNukl) (either quadbin or h3) as constructed using the available procedures in the Analytics Toolbox.
* Derive a similarity skill score between the origin and each target locations by ranking the distance between the origin and each target cell in the variable space (where the selected variables are first transformed using their Principal Component scores to account for pairwise correlations) with respect to the score of the average cell in the target areas.

## You'll need...

To follow this tutorial, you'll need:

* A subscription to the **Sociodemographics, 2014, 5yrs - United States of America (Census Block Group) \[20102014]** table via the CARTO Data Observatory.&#x20;

All other data used here is available as public data.

## Data preparation

We'll use the [GRIDIFY\_ENRICH](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/data#gridify_enrich) procedure from the data module in CARTO’s Analytics Toolbox to prepare the data of our analysis. This procedure is used to first gridify a set of geometries (point data in this case) to a quadkey grid with zoom 15, and then to enrich grid cell with data from a subscription to one of the datasets available in the Data Observatory, including the total population (`total_pop_3409f36f)` and the number of households (`households_d7d24db5`) at the Census Block Group level from the [ACS](https://carto.com/spatial-data-catalog/browser/?country=usa\&provider=usa_acs) Sociodemographics dataset\*, as well as from a custom dataset, which contains the count of road links (`count_qualified`) per zip code.

\*Please note that this data was retired from the in June 2025. You can find similar data products from providers like the ACS by searching for publicly-available demographics data in the Data Observatory.

We begin by preparing data for the original locations, specifically, those where we already have stores performing well. These locations will serve as references for identifying similar areas later on.

Execute the following code to achieve this. Please note that call procedures should be run either from your BigQuery console, or from a Call Procedure component in [CARTO Workflows](/creating-workflows/introduction-to-carto-workflows.md). When running the below code, you will need to replace:&#x20;

* "ac\_xxxxxx" in Data Observatory enrichment with your unique CARTO connection ID. You can find this in the Data Explorer > Data Observatory > select Access in... for any subscription and choose your connection - your unique code can be found here.
* "yourproject.yourdataset" with a location to save the results to.&#x20;

{% tabs %}
{% tab title="carto-un" %}

```sql
CALL `carto-un`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    -- Output table
    'yourproject.yourdataset.twin_areas_origin_enriched',
    R'''
    {
        "kring": 1,
        "decay": "uniform"
    }
    '''
);
```

{% endtab %}

{% tab title="carto-un-eu" %}

```sql
CALL `carto-un-eu`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    -- Output table
    'yourproject.yourdataset.twin_areas_origin_enriched',
    R'''
    {
        "kring": 1,
        "decay": "uniform"
    }
    '''
);
```

{% endtab %}

{% tab title="manual" %}

```sql
CALL carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    -- Output table
    'yourproject.yourdataset.twin_areas_origin_enriched',
    R'''
    {
        "kring": 1,
        "decay": "uniform"
    }
    '''
);
```

{% endtab %}
{% endtabs %}

This map shows both the locations of the selected stores (*above*) as well as the enriched grid for the population variable (*below*)

{% embed url="<https://clausa.app.carto.com/map/04a1916a-f7d6-4cda-8e84-cfa4a825628c>" %}

Next, we can use this same procedure to gridify and enrich the target areas for which we will use a the Census Tracts polygons in Texas in the main urban areas.

{% tabs %}
{% tab title="carto-un" %}

```sql
CALL `carto-un`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT geom FROM `cartobq.docs.twin_areas_target`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'yourproject.yourdataset.twin_areas_target_enriched');
```

{% endtab %}

{% tab title="carto-un-eu" %}

```sql
CALL `carto-un-eu`.carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT geom FROM `cartobq.docs.twin_areas_target`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'yourproject.yourdataset.twin_areas_target_enriched');
```

{% endtab %}

{% tab title="manual" %}

```sql
CALL carto.GRIDIFY_ENRICH(
    -- Input query
    'SELECT geom FROM `cartobq.docs.twin_areas_target`',
    -- Grid params: grid type and level
    'quadbin', 15,
    -- Data Observatory enrichment
    [('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
    'carto-data.ac_xxxxxx',
    -- Custom data enrichment
    '''
    SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
    ''',
    [('count_qualified','count')],
    0,"uniform",
    -- Output table
    'yourproject.yourdataset.twin_areas_target_enriched');
```

{% endtab %}
{% endtabs %}

The resulting grid is shown the map below.

{% embed url="<https://clausa.app.carto.com/map/bda48d97-09d6-4aa2-9807-db1d33d4383b>" %}

## Twin Areas Analysis

### Create the Twin Areas model

We use the [BUILD\_TWIN\_AREAS\_MODEL](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/retail#build_twin_areas_model) procedure to create the twin areas model. For both the origin and the target cells, this procedure transforms the input data by standardizing the numerical variables and creating a standardized indicator matrix for the categorical variables and then it creates a [Principal Component Analysis (PCA)](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-pca) model using the processed target data as input.

{% tabs %}
{% tab title="carto-un" %}

```sql
CALL `carto-un`.carto.BUILD_TWIN_AREAS_MODEL
(
    'cartobq.docs.twin_areas_origin_enriched',
    'cartobq.docs.twin_areas_target_enriched',
    'quadbin',
    'project.dataset.twin_areas_analysis_01',
    '''{
        "model_options":{
            "PCA_EXPLAINED_VARIANCE_RATIO":0.9
        }
    }'''
);
```

{% endtab %}

{% tab title="carto-un-eu" %}

```sql
CALL `carto-un-eu`.carto.BUILD_TWIN_AREAS_MODEL
(
    'cartobq.docs.twin_areas_origin_enriched',
    'cartobq.docs.twin_areas_target_enriched',
    'quadbin',
    'project.dataset.twin_areas_analysis_01',
    '''{
        "model_options":{
            "PCA_EXPLAINED_VARIANCE_RATIO":0.9
        }
    }'''
);
```

{% endtab %}

{% tab title="manual" %}

```sql
CALL carto.BUILD_TWIN_AREAS_MODEL
(
    'cartobq.docs.twin_areas_origin_enriched',
    'cartobq.docs.twin_areas_target_enriched',
    'quadbin',
    'project.dataset.twin_areas_analysis_01',
    '''{
        "model_options":{
            "PCA_EXPLAINED_VARIANCE_RATIO":0.9
        }
    }'''
);
```

{% endtab %}
{% endtabs %}

### Find twin areas

Once we gridified and enriched the origin and target areas, we can then run the [FIND\_TWIN\_AREAS](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/retail#find_twin_areas) procedure for a given origin location, here selected as the store with the highest revenue:

{% tabs %}
{% tab title="carto-un" %}

```sql
CALL `carto-un`.carto.FIND_TWIN_AREAS
(
    -- Twin areas model
    'project.dataset.twin_areas_analysis_01_model',
    -- Index column name
    'quadbin',
    -- Output table
    'project.dataset.twin_areas_analysis_01_results',
    -- Options
    '''{
        "origin_index":"5256404125934944255"
    }'''
);
```

{% endtab %}

{% tab title="carto-un-eu" %}

```sql
CALL `carto-un-eu`.carto.FIND_TWIN_AREAS
(
    -- Twinareas model
    'project.dataset.twin_areas_analysis_01_model',
    -- Index column name
    'quadbin',
    -- Output table
    'project.dataset.twin_areas_analysis_01_results',
    -- Options
    '''{
        "origin_index":"5256404125934944255"
    }'''
);
```

{% endtab %}

{% tab title="manual" %}

```sql
CALL carto.FIND_TWIN_AREAS
(
    -- Twinareas model
    'project.dataset.twin_areas_analysis_01_model',
    -- Index column name
    'quadbin',
    -- Output table
    'project.dataset.twin_areas_analysis_01_results',
    -- Options
    '''{
        "origin_index":"5256404125934944255"
    }'''
);
```

{% endtab %}
{% endtabs %}

{% embed url="<https://clausa.app.carto.com/map/3fdf0e74-7301-4a19-81a9-d6a21dadc691>" %}

This map shows the similarity skill score for all the target cells with a positive score: larger scores indicate areas more similar to the origin location.

Traditionally, discovering new areas for businesses represented a difficult and lengthy process, which required on-site market analysis and local expertise. Using instead our Twin Areas tool, retailers and companies in CPG can now easily discover the best locations to expand or optimize their network without a strong prior knowledge of the area and optimize their site planning process by taking advantage of our comprehensive data catalog and the analytical capability of CARTO’s cloud-native platform.

{% hint style="info" %}
Check out this [blogpost](https://docs.carto.com/analytics-toolbox-bigquery/sql-reference/retail/#find_twin_areas) for more information on the application of the Twin Areas analysis to this use case.
{% endhint %}

<img src="/files/4m1BK9j4Wq34gat4HHd2" alt="EU flag" data-size="line"> This project has received funding from the [European Union’s Horizon 2020](https://ec.europa.eu/programmes/horizon2020/en) research and innovation programme under grant agreement No 960401.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/find-twin-areas-of-top-performing-stores.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.