LogoLogo
HomeDocumentationLoginTry for free
  • CARTO Academy
  • Working with geospatial data
    • Geospatial data: the basics
      • What is location data?
      • Types of location data
      • Changing between types of geographical support
    • Optimizing your data for spatial analysis
    • Introduction to Spatial Indexes
      • Spatial Index support in CARTO
      • Create or enrich an index
      • Work with unique Spatial Index properties
      • Scaling common geoprocessing tasks with Spatial Indexes
      • Using Spatial Indexes for analysis
        • Calculating traffic accident rates
        • Which cell phone towers serve the most people?
    • The modern geospatial analysis stack
      • Spatial data management and analytics with CARTO QGIS Plugin
      • Using data from a REST API for real-time updates
  • Building interactive maps
    • Introduction to CARTO Builder
    • Data sources & map layers
    • Widgets & SQL Parameters
    • AI Agents
    • Data visualization
      • Build a dashboard with styled point locations
      • Style qualitative data using hex color codes
      • Create an animated visualization with time series
      • Visualize administrative regions by defined zoom levels
      • Build a dashboard to understand historic weather events
      • Customize your visualization with tailored-made basemaps
      • Visualize static geometries with attributes varying over time
      • Mapping the precipitation impact of Hurricane Milton with raster data
    • Data analysis
      • Filtering multiple data sources simultaneously with SQL Parameters
      • Generate a dynamic index based on user-defined weighted variables
      • Create a dashboard with user-defined analysis using SQL Parameters
      • Analyzing multiple drive-time catchment areas dynamically
      • Extract insights from your maps with AI Agents
    • Sharing and collaborating
      • Dynamically control your maps using URL parameters
      • Embedding maps in BI platforms
    • Solving geospatial use-cases
      • Build a store performance monitoring dashboard for retail stores in the USA
      • Analyzing Airbnb ratings in Los Angeles
      • Assessing the damages of La Palma Volcano
    • CARTO Map Gallery
  • Creating workflows
    • Introduction to CARTO Workflows
    • Step-by-step tutorials
      • Creating a composite score for fire risk
      • Spatial Scoring: Measuring merchant attractiveness and performance
      • Using crime data & spatial analysis to assess home insurance risk
      • Identify the best billboards and stores for a multi-channel product launch campaign
      • Estimate the population covered by LTE cells
      • A no-code approach to optimizing OOH advertising locations
      • Optimizing site selection for EV charging stations
      • How to optimize location planning for wind turbines
      • Calculate population living around top retail locations
      • Identifying customers potentially affected by an active fire in California
      • Finding stores in areas with weather risks
      • How to run scalable routing analysis the easy way
      • Geomarketing techniques for targeting sportswear consumers
      • How to use GenAI to optimize your spatial analysis
      • Analyzing origin and destination patterns
      • Understanding accident hotspots
      • Real-Time Flood Claims Analysis
      • Train a classification model to estimate customer churn
      • Space-time anomaly detection for real-time portfolio management
      • Identify buildings in areas with a deficit of cell network antennas
    • Workflow templates
      • Data Preparation
      • Data Enrichment
      • Spatial Indexes
      • Spatial Analysis
      • Generating new spatial data
      • Statistics
      • Retail and CPG
      • Telco
      • Insurance
      • Out Of Home Advertising
      • BigQuery ML
      • Snowflake ML
  • Advanced spatial analytics
    • Introduction to the Analytics Toolbox
    • Spatial Analytics for BigQuery
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Spacetime hotspot classification: Understanding collision patterns
        • Time series clustering: Identifying areas with similar traffic accident patterns
        • Detecting space-time anomalous regions to improve real estate portfolio management (quick start)
        • Detecting space-time anomalous regions to improve real estate portfolio management
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Analyzing signal coverage with line-of-sight calculation and path loss estimation
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Find similar locations based on their trade areas
        • Calculating market penetration in CPG with merchant universe matching
        • Measuring merchant attractiveness and performance in CPG with spatial scores
        • Segmenting CPG merchants using trade areas characteristics
        • Store cannibalization: quantifying the effect of opening new stores on your existing network
        • Find Twin Areas of top-performing stores
        • Opening a new Pizza Hut location in Honolulu
        • An H3 grid of Starbucks locations and simple cannibalization analysis
        • Data enrichment using the Data Observatory
        • New police stations based on Chicago crime location clusters
        • Interpolating elevation along a road using kriging
        • Analyzing weather stations coverage using a Voronoi diagram
        • A NYC subway connection graph using Delaunay triangulation
        • Computing US airport connections and route interpolations
        • Identifying earthquake-prone areas in the state of California
        • Bikeshare stations within a San Francisco buffer
        • Census areas in the UK within tiles of multiple resolutions
        • Creating simple tilesets
        • Creating spatial index tilesets
        • Creating aggregation tilesets
        • Using raster and vector data to calculate total rooftop PV potential in the US
        • Using the routing module
      • About Analytics Toolbox regions
    • Spatial Analytics for Snowflake
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Opening a new Pizza Hut location in Honolulu
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
        • A Quadkey grid of stores locations and simple cannibalization analysis
        • Minkowski distance to perform cannibalization analysis
        • Computing US airport connections and route interpolations
        • New supplier offices based on store locations clusters
        • Analyzing store location coverage using a Voronoi diagram
        • Enrichment of catchment areas for store characterization
        • Data enrichment using the Data Observatory
    • Spatial Analytics for Redshift
      • Step-by-step tutorials
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
Powered by GitBook
On this page
  • Generating the trade areas
  • Enriching the trade areas
  • Run the analysis of similar locations

Was this helpful?

Export as PDF
  1. Advanced spatial analytics
  2. Spatial Analytics for BigQuery
  3. Step-by-step tutorials

Find similar locations based on their trade areas

Last updated 12 months ago

Was this helpful?

In the retail and CPG industries, it is common to find the need to understand a set of candidate locations when making different supply and stock decisions. In this example, we follow the steps that one can follow using CARTO and the Analytics Toolbox to rank a set of locations based on the demographic similarity to a chosen location.

These are the main steps to follow, starting with a set of locations:

  1. Define their trade areas.

  2. Enrich such trade areas using demographic data from the Data Observatory.

  3. Run the analysis of similar locations and visualize it on a map.

Choosing a sample of locations

CREATE OR REPLACE TABLE
  `<your-project>.<your-dataset>.stores` AS (
    SELECT
      store_number,
      ANY_VALUE(store_name) AS store_name,
      ANY_VALUE(store_location) AS store_location
    FROM
      `bigquery-public-data.iowa_liquor_sales.sales`
    WHERE
      store_location IS NOT NULL
      AND date BETWEEN '2021-01-01' AND '2021-12-31'
      AND city LIKE '%DES MOINES%'
    GROUP BY
      store_number
  );

We can visualize this sample in the following map:

Generating the trade areas

In this step, we will define each location’s trade area. We can understand these trade areas as the zones influenced by each of the stores. The Analytics Toolbox also provides a handy function to achieve this, GENERATE_TRADE_AREAS:

CALL `carto-un`.carto.GENERATE_TRADE_AREAS(
  '''
  SELECT
    store_number AS store_id,
    store_location AS geom
  FROM
    `<your-project>.<your-dataset>.stores`
  ''',
  'buffer',
  "{'buffer':500.0}",
  '<your-project>.<your-dataset>.stores'
);
CALL `carto-un-eu`.carto.GENERATE_TRADE_AREAS(
  '''
  SELECT
    store_number AS store_id,
    store_location AS geom
  FROM
    `<your-project>.<your-dataset>.stores`
  ''',
  'buffer',
  "{'buffer':500.0}",
  '<your-project>.<your-dataset>.stores'
);
CALL carto.GENERATE_TRADE_AREAS(
  '''
  SELECT
    store_number AS store_id,
    store_location AS geom
  FROM
    `<your-project>.<your-dataset>.stores`
  ''',
  'buffer',
  "{'buffer':500.0}",
  '<your-project>.<your-dataset>.stores'
);

Running this procedure will generate the table <your-project>.<your-dataset>.stores_trade_areas, which will map each store_id to a 500m-radius circular buffer.

Enriching the trade areas

CALL `carto-un`.carto.ENRICH_POLYGONS(
   -- Trade areas table
   'SELECT * FROM `<your-project>.<your-dataset>.stores_trade_areas`',
   'geom',
   -- External data available for Des Moines
   'SELECT * FROM `cartobq.docs.similar_locations_example_sociodemo`',
   'geom',
   [
      ('total_pop', 'sum'),
      ('male_21', 'sum'),
      ('female_21', 'sum')
   ],
   -- Destination slug
   ['`<your-project>.<your-dataset>.stores_trade_areas_enriched`']
);
CALL `carto-un-eu`.carto.ENRICH_POLYGONS(
   -- Trade areas table
   'SELECT * FROM `<your-project>.<your-dataset>.stores_trade_areas`',
   'geom',
   -- External data available for Des Moines
   'SELECT * FROM `cartobq.docs.similar_locations_example_sociodemo`',
   'geom',
   [
      ('total_pop', 'sum'),
      ('male_21', 'sum'),
      ('female_21', 'sum')
   ],
   -- Destination slug
   ['`<your-project>.<your-dataset>.stores_trade_areas_enriched`']
);
CALL carto.ENRICH_POLYGONS(
   -- Trade areas table
   'SELECT * FROM `<your-project>.<your-dataset>.stores_trade_areas`',
   'geom',
   -- External data available for Des Moines
   'SELECT * FROM `cartobq.docs.similar_locations_example_sociodemo`',
   'geom',
   [
      ('total_pop', 'sum'),
      ('male_21', 'sum'),
      ('female_21', 'sum')
   ],
   -- Destination slug
   ['`<your-project>.<your-dataset>.stores_trade_areas_enriched`']
);

Run the analysis of similar locations

Now that each trade area is enriched, let’s run the similarity analysis. To do so, we need to choose the following:

  • An origin location, that will be taken as a reference to measure similarity.

  • A set of target locations, that will be analyzed to check how similar each of them is to the origin location.

Since both our origin and target locations come from the same source, let us save it as a table in BigQuery:

CREATE OR REPLACE TABLE
  `<your-project>.<your-dataset>.store_features` AS (
    SELECT
      store_info.store_number,
      trade_area.* EXCEPT (geom, method, input_arguments, store_id)
    FROM
      `<your-project>.<your-dataset>.stores` store_info
      LEFT JOIN `<your-project>.<your-dataset>.stores_trade_areas_enriched` trade_area
      ON store_info.store_number = trade_area.store_id
  )

In this convenience table, we have store_number serving as unique ID and all the feature columns we have previously computed.

As we said before, in this example, both origin and target locations come from the same source, but that is not a requirement: origin and target locations can come from different places as long as they can be enriched with the same variables in a comparable scale.

For this example, we are going to take as reference store #2628.

CALL `carto-un`.carto.FIND_SIMILAR_LOCATIONS(
    -- Origin query
    """
    SELECT
      *
    FROM
      `<your-project>.<your-dataset>.store_features`
    WHERE
      store_number = '2682'
    """,
    -- Target query
    """
    SELECT
      *
    FROM
      `<your-project>.<your-dataset>.store_features`
    WHERE
      store_number <> '2682'
    """,
    -- Function parameters
    'store_number',
    0.90,
    NULL,
    '<your-project>.<your-dataset>.similar_locations'
);
CALL `carto-un-eu`.carto.FIND_SIMILAR_LOCATIONS(
    -- Origin query
    """
    SELECT
      *
    FROM
      `<your-project>.<your-dataset>.store_features`
    WHERE
      store_number = '2682'
    """,
    -- Target query
    """
    SELECT
      *
    FROM
      `<your-project>.<your-dataset>.store_features`
    WHERE
      store_number <> '2682'
    """,
    -- Function parameters
    'store_number',
    0.90,
    NULL,
    '<your-project>.<your-dataset>.similar_locations'
);
CALL carto.FIND_SIMILAR_LOCATIONS(
    -- Origin query
    """
    SELECT
      *
    FROM
      `<your-project>.<your-dataset>.store_features`
    WHERE
      store_number = '2682'
    """,
    -- Target query
    """
    SELECT
      *
    FROM
      `<your-project>.<your-dataset>.store_features`
    WHERE
      store_number <> '2682'
    """,
    -- Function parameters
    'store_number',
    0.90,
    NULL,
    '<your-project>.<your-dataset>.similar_locations'
);

This procedure will create the table <your-project>.<your-dataset>.similar_locations_2682_results, where we can find the similarity_skill_score column that we need for our analysis. Let us display these values on a map to check the results.

The first thing we can notice is how the map contains fewer locations than before: the similar locations procedure only returns those stores that are more similar than the average. Out of those, we can check the individual similarity using the column similarity_score (which we can think of as a “distance” to the original location, the lower the better) or similarity_skill_score (a normalized version that we can think of as a similarity measure, the higher the better).

Using this similarity_skill_score, we can see how the nearby stores get a very high level of similarity, since our trade areas were solely based in the vicinity of each location. However, we can see how different patterns emerge as well in other parts of the city, were similar locations are found as well.

In this example, we will use a small subset of the locations available in the , which is publicly available. For this example, we will keep only stores in Des Moines that were active during 2021.

Our sample has a column named store_number that uniquely identifies each of the locations. This column is relevant because it is a requirement for the FIND_SIMILAR_LOCATIONS function. We also filter those whose geographical location is known because we will use that location for the next step (generating the trade areas). Bear in mind that the Analytics Toolbox provides functions like GEOCODE_TABLE to infer the geography from an address, like in .

This is the simplest way to generate a trade area; a more complex example of this function can be found in , which showcases how to generate isoline-based trade areas. Remember that the enrichment functions simply require a polygon-based GEOGRAPHY column; any other custom geometry can also be used as trade area.

Now that we already have a defined set of trade areas per location, we can use external data available to enrich such areas. For this example, we will be fetching some basic population variables segmented by age and gender from the .

It is also possible to enrich the trade areas using variables straight from the Data Observatory, as long as you have an active subscription to them. To achieve it, we can use DATAOBS_SUBSCRIPTIONS, DATAOBS_VARIABLES, and DATAOBS_ENRICH_POLYGONS functions in the Analytics Toolbox as per .

This project has received funding from the research and innovation programme under grant agreement No 960401.

Iowa Liquor Sales dataset
this example
this example
American Community Survey data
this guide
European Union’s Horizon 2020
Advanced difficulty banner
EU flag