LogoLogo
HomeDocumentationLoginTry for free
  • CARTO Academy
  • Working with geospatial data
    • Geospatial data: the basics
      • What is location data?
      • Types of location data
      • Changing between types of geographical support
    • Optimizing your data for spatial analysis
    • Introduction to Spatial Indexes
      • Spatial Index support in CARTO
      • Create or enrich an index
      • Work with unique Spatial Index properties
      • Scaling common geoprocessing tasks with Spatial Indexes
      • Using Spatial Indexes for analysis
        • Calculating traffic accident rates
        • Which cell phone towers serve the most people?
    • The modern geospatial analysis stack
      • Spatial data management and analytics with CARTO QGIS Plugin
      • Using data from a REST API for real-time updates
  • Building interactive maps
    • Introduction to CARTO Builder
    • Data sources & map layers
    • Widgets & SQL Parameters
    • AI Agents
    • Data visualization
      • Build a dashboard with styled point locations
      • Style qualitative data using hex color codes
      • Create an animated visualization with time series
      • Visualize administrative regions by defined zoom levels
      • Build a dashboard to understand historic weather events
      • Customize your visualization with tailored-made basemaps
      • Visualize static geometries with attributes varying over time
      • Mapping the precipitation impact of Hurricane Milton with raster data
    • Data analysis
      • Filtering multiple data sources simultaneously with SQL Parameters
      • Generate a dynamic index based on user-defined weighted variables
      • Create a dashboard with user-defined analysis using SQL Parameters
      • Analyzing multiple drive-time catchment areas dynamically
      • Extract insights from your maps with AI Agents
    • Sharing and collaborating
      • Dynamically control your maps using URL parameters
      • Embedding maps in BI platforms
    • Solving geospatial use-cases
      • Build a store performance monitoring dashboard for retail stores in the USA
      • Analyzing Airbnb ratings in Los Angeles
      • Assessing the damages of La Palma Volcano
    • CARTO Map Gallery
  • Creating workflows
    • Introduction to CARTO Workflows
    • Step-by-step tutorials
      • Creating a composite score for fire risk
      • Spatial Scoring: Measuring merchant attractiveness and performance
      • Using crime data & spatial analysis to assess home insurance risk
      • Identify the best billboards and stores for a multi-channel product launch campaign
      • Estimate the population covered by LTE cells
      • A no-code approach to optimizing OOH advertising locations
      • Optimizing site selection for EV charging stations
      • How to optimize location planning for wind turbines
      • Calculate population living around top retail locations
      • Identifying customers potentially affected by an active fire in California
      • Finding stores in areas with weather risks
      • How to run scalable routing analysis the easy way
      • Geomarketing techniques for targeting sportswear consumers
      • How to use GenAI to optimize your spatial analysis
      • Analyzing origin and destination patterns
      • Understanding accident hotspots
      • Real-Time Flood Claims Analysis
      • Train a classification model to estimate customer churn
      • Space-time anomaly detection for real-time portfolio management
      • Identify buildings in areas with a deficit of cell network antennas
    • Workflow templates
      • Data Preparation
      • Data Enrichment
      • Spatial Indexes
      • Spatial Analysis
      • Generating new spatial data
      • Statistics
      • Retail and CPG
      • Telco
      • Insurance
      • Out Of Home Advertising
      • BigQuery ML
      • Snowflake ML
  • Advanced spatial analytics
    • Introduction to the Analytics Toolbox
    • Spatial Analytics for BigQuery
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Spacetime hotspot classification: Understanding collision patterns
        • Time series clustering: Identifying areas with similar traffic accident patterns
        • Detecting space-time anomalous regions to improve real estate portfolio management (quick start)
        • Detecting space-time anomalous regions to improve real estate portfolio management
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Analyzing signal coverage with line-of-sight calculation and path loss estimation
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Find similar locations based on their trade areas
        • Calculating market penetration in CPG with merchant universe matching
        • Measuring merchant attractiveness and performance in CPG with spatial scores
        • Segmenting CPG merchants using trade areas characteristics
        • Store cannibalization: quantifying the effect of opening new stores on your existing network
        • Find Twin Areas of top-performing stores
        • Opening a new Pizza Hut location in Honolulu
        • An H3 grid of Starbucks locations and simple cannibalization analysis
        • Data enrichment using the Data Observatory
        • New police stations based on Chicago crime location clusters
        • Interpolating elevation along a road using kriging
        • Analyzing weather stations coverage using a Voronoi diagram
        • A NYC subway connection graph using Delaunay triangulation
        • Computing US airport connections and route interpolations
        • Identifying earthquake-prone areas in the state of California
        • Bikeshare stations within a San Francisco buffer
        • Census areas in the UK within tiles of multiple resolutions
        • Creating simple tilesets
        • Creating spatial index tilesets
        • Creating aggregation tilesets
        • Using raster and vector data to calculate total rooftop PV potential in the US
        • Using the routing module
      • About Analytics Toolbox regions
    • Spatial Analytics for Snowflake
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Opening a new Pizza Hut location in Honolulu
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
        • A Quadkey grid of stores locations and simple cannibalization analysis
        • Minkowski distance to perform cannibalization analysis
        • Computing US airport connections and route interpolations
        • New supplier offices based on store locations clusters
        • Analyzing store location coverage using a Voronoi diagram
        • Enrichment of catchment areas for store characterization
        • Data enrichment using the Data Observatory
    • Spatial Analytics for Redshift
      • Step-by-step tutorials
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
Powered by GitBook
On this page
  • Step 1 - Defining the trade area for each merchant
  • Step 2 - Enriching the trade areas with the desired features for the analysis
  • Step 3 - Running the segmentation algorithm

Was this helpful?

Export as PDF
  1. Advanced spatial analytics
  2. Spatial Analytics for BigQuery
  3. Step-by-step tutorials

Segmenting CPG merchants using trade areas characteristics

Last updated 1 year ago

Was this helpful?

Understanding customers (as merchants are referred to within the CPG industry), and prioritizing which are the best points of sale to push your products through, is as important now as ever for the CPG industry.

A key analysis towards understanding your merchants’ potential is to identify the characteristics of their trade areas (e.g. population, visitors, proximity to transport network, etc.) and to perform an appropriate profiling and segmentation of them.

In this example, we showcase how you can leverage to segment your customers or merchants based on the characteristics of their trade areas. A more detailed description can be found in this .

Step 1 - Defining the trade area for each merchant

For this example, we will use the locations of restaurants and cafeterias in the high density urban areas surrounding the bay area of San Francisco, which you can find available at cartobq.docs.cpg_product_launch_bay_area_store_locations.

  • The merchant locations.

  • The method to generate the trade areas, with 3 available options: buffer, number of layers using a spatial index, and isolines.

  • The specific arguments for the selected method of trade area generation.

Herein, as a method for the trade areas we have selected a buffer and we have defined a 500m radius. This will generate a 500m buffer around each location.

CALL `carto-un`.carto.GENERATE_TRADE_AREAS(
  --customer_query;
   '''
   Select store_id, geom from `cartobq.docs.cpg_product_launch_bay_area_store_locations`
''',
   --selecting the method
   'buffer',
   --method options
   "{'buffer':500.0}",
   --output_prefix
   'cartobq.docs.cpg_product_launch_bay_area_high_urban'
);
CALL `carto-un-eu`.carto.GENERATE_TRADE_AREAS(
  --customer_query;
   '''
   Select store_id, geom from `cartobq.docs.cpg_product_launch_bay_area_store_locations`
''',
   --selecting the method
   'buffer',
   --method options
   "{'buffer':500.0}",
   --output_prefix
   'cartobq.docs.cpg_product_launch_bay_area_high_urban'
);
CALL carto.GENERATE_TRADE_AREAS(
  --customer_query;
   '''
   Select store_id, geom from `cartobq.docs.cpg_product_launch_bay_area_store_locations`
''',
   --selecting the method
   'buffer',
   --method options
   "{'buffer':500.0}",
   --output_prefix
   'cartobq.docs.cpg_product_launch_bay_area_high_urban'
);

An example of the table produced by the above function, cartobq.docscpg_product_launch_bay_area_high_urban_trade_areas, is illustrated below. store_id is the unique identifier of each location and geom is the geometry of the trade area.

Step 2 - Enriching the trade areas with the desired features for the analysis

As input, the user should provide:

  • The table with location information about the merchants (unique id and trade area geometry), and optionally any preprocessed feature associated with the trade area;

  • The variables/features from Data Observatory subscriptions to be used, and the location of the Data Observatory subscription in the data warehouse.

  • Features from the users’ own tables.

In this example, we consider the following features would be the relevant for this exercise:

  • Consumer spending: Food and beverage expenditure (at home and out of home), alcoholic expenditure;

  • Points of Interest: Total number of restaurants and cafés in area (i.e. HORECA count).

In order to enrich with the previous data, we simulate the scenario in which the user:

  • Has the consumer spending data in their own 1st party tables, which can be found at: cartobq.docs.cpg_product_launch_bay_area_consumer_spending; and

  • adds one pre-processed feature to the input table, the total number of HORECA POIs inside each trade area, directly computed in the input query (see query below).

Bear in mind that this simulation is done to demonstrate a way to incorporate additional pre-processed features.

The query to get the number of HORECA POIs within each merchant’s trade area is:

SELECT t.*, CAST(IFNULL(horeca_count,0) as FLOAT64) as horeca_count
 from `cartobq.docs.cpg_product_launch_bay_area_high_urban` t
 LEFT JOIN (SELECT a.store_id,count(*) as horeca_count
FROM `cartobq.docs.cpg_product_launch_bay_area_high_urban` a
CROSS JOIN `cartobq.docs.cpg_product_launch_bay_area_high_urban` b
WHERE ST_INTERSECTS(ST_CENTROID(b.geom), a.geom)
GROUP BY a.store_id) c on t.store_id = c.store_id

The function call to build the data and conclude the Step 2 is:

CALL `carto-un`.carto.CUSTOMER_SEGMENTATION_ANALYSIS_DATA(
-- Select the trade areas of merchants, can be pre-enriched trade areas
 R'''
 SELECT t.* EXCEPT(method, input_arguments), CAST(IFNULL(horeca_count,0) as FLOAT64) as horeca_count
 from `cartobq.docs.cpg_product_launch_bay_area_high_urban_trade_areas` t
 LEFT JOIN (SELECT a.store_id,count(*) as horeca_count
FROM `cartobq.docs.cpg_product_launch_bay_area_high_urban_trade_areas` a
CROSS JOIN `cartobq.docs.cpg_product_launch_bay_area_high_urban_trade_areas` b
WHERE ST_INTERSECTS(ST_CENTROID(b.geom), a.geom)
GROUP BY a.store_id) c on t.store_id = c.store_id
 ''',
 -- Data Observatory enrichment
   NULL, NULL,
   -- Custom data enrichment
   [("food_at_home",'avg'),("food_away_from_home",'avg'),('alcoholic_expenditure','avg')],
   R'''
   SELECT *
     FROM `cartodb-on-gcp-pm-team.antonis.cpg_product_launch_bay_area_consumer_spending`
   ''' ,
 --output_prefix
   'cartobq.docs.cpg_product_launch_bay_area_step_2'
)b;
CALL `carto-un-eu`.carto.CUSTOMER_SEGMENTATION_ANALYSIS_DATA(
-- Select the trade areas of merchants, can be pre-enriched trade areas
 R'''
 SELECT t.* EXCEPT(method, input_arguments), CAST(IFNULL(horeca_count,0) as FLOAT64) as horeca_count
 from `cartobq.docs.cpg_product_launch_bay_area_high_urban_trade_areas` t
 LEFT JOIN (SELECT a.store_id,count(*) as horeca_count
FROM `cartobq.docs.cpg_product_launch_bay_area_high_urban_trade_areas` a
CROSS JOIN `cartobq.docs.cpg_product_launch_bay_area_high_urban_trade_areas` b
WHERE ST_INTERSECTS(ST_CENTROID(b.geom), a.geom)
GROUP BY a.store_id) c on t.store_id = c.store_id
 ''',
 -- Data Observatory enrichment
   NULL, NULL,
   -- Custom data enrichment
   [("food_at_home",'avg'),("food_away_from_home",'avg'),('alcoholic_expenditure','avg')],
   R'''
   SELECT *
     FROM `cartodb-on-gcp-pm-team.antonis.cpg_product_launch_bay_area_consumer_spending`
   ''' ,
 --output_prefix
   'cartobq.docs.cpg_product_launch_bay_area_step_2'
);
CALL carto.CUSTOMER_SEGMENTATION_ANALYSIS_DATA(
-- Select the trade areas of merchants, can be pre-enriched trade areas
 R'''
 SELECT t.* EXCEPT(method, input_arguments), CAST(IFNULL(horeca_count,0) as FLOAT64) as horeca_count
 from `cartobq.docs.cpg_product_launch_bay_area_high_urban_trade_areas` t
 LEFT JOIN (SELECT a.store_id,count(*) as horeca_count
FROM `cartobq.docs.cpg_product_launch_bay_area_high_urban_trade_areas` a
CROSS JOIN `cartobq.docs.cpg_product_launch_bay_area_high_urban_trade_areas` b
WHERE ST_INTERSECTS(ST_CENTROID(b.geom), a.geom)
GROUP BY a.store_id) c on t.store_id = c.store_id
 ''',
 -- Data Observatory enrichment
   NULL, NULL,
   -- Custom data enrichment
   [("food_at_home",'avg'),("food_away_from_home",'avg'),('alcoholic_expenditure','avg')],
   R'''
   SELECT *
     FROM `cartodb-on-gcp-pm-team.antonis.cpg_product_launch_bay_area_consumer_spending`
   ''' ,
 --output_prefix
   'cartobq.docs.cpg_product_launch_bay_area_step_2'
);

The outputs of this step are:

  • The final enriched table cartobq.docs.cpg_product_launch_bay_area_step_2_custom_enrich,

  • A table with the correlation between every pair of features cartobq.docs.cpg_product_launch_bay_area_step_2_correlation

  • A table with descriptive statistics for each feature cartobq.docs.cpg_product_launch_bay_area_step_2_descriptives.

Examples of the last two tables can be seen below.

Correlation table

This table shows the correlation between every pair of features. The col1 and col2 columns indicate the pair of features, while the column corr contains the value of correlation for each pair. It is used to identify relationships amongst the features and whether PCA would benefit the analysis or not.

Descriptive statistics table

Step 3 - Running the segmentation algorithm

CALL `carto-un`.carto.RUN_CUSTOMER_SEGMENTATION(
--select the source table of merchants enriched with geospatial characteristics
  'cartobq.docs.cpg_product_launch_bay_area_step_2_enrich',
--select the number of clusters to be identified (two analyses to identify 6 and 7 clusters respectively)
   [6, 7],
--PCA explainability ratio
   0.9,
--output prefix
   'cartobq.docs.cpg_product_launch_bay_area_step_3'
);
CALL `carto-un-eu`.carto.RUN_CUSTOMER_SEGMENTATION(
--select the source table of merchants enriched with geospatial characteristics
  'cartobq.docs.cpg_product_launch_bay_area_step_2_enrich',
--select the number of clusters to be identified (two analyses to identify 6 and 7 clusters respectively)
   [6, 7],
--PCA explainability ratio
   0.9,
--output prefix
   'cartobq.docs.cpg_product_launch_bay_area_step_3'
);
CALL carto.RUN_CUSTOMER_SEGMENTATION(
--select the source table of merchants enriched with geospatial characteristics
  'cartobq.docs.cpg_product_launch_bay_area_step_2_enrich',
--select the number of clusters to be identified (two analyses to identify 6 and 7 clusters respectively)
   [6, 7],
--PCA explainability ratio
   0.9,
--output prefix
   'cartobq.docs.cpg_product_launch_bay_area_step_3'
);

The output gives the customers´ locations assigned to segments, as well as a series of descriptive statistics that focus on features (e.g., the percentiles of the entire input data and of each segment, for each variable), or that focus on the quality of the model output. The output tables can be found at:

  • Segment assignment: cartobq.docs.cpg_product_launch_bay_area_step_3_clusters

  • Segments descriptives: cartobq.docs.cpg_product_launch_bay_area_step_3_clusters_descr

  • Clustering statistics: cartobq.docs.cpg_product_launch_bay_area_step_3_clusters_stats

Below we can see the resulting segment assignment table in which we have every merchant assigned to one cluster. Columns cluster_6 and cluster_7 contain the cluster to wich each merchant is assigned to when solving for 6 and 7 clusters, respectively.

An example of the second table, the descriptive statistics for each case/cluster, can be seen below. Each row corresponds to a clustering scenario, a cluster label and the feature name. For each of these tuples, the descriptive statistics are shown. For example, the first 3 columns of the 6th row are: cluster_7, value (cluster label) 1 and horeca_count. This row refers to the scenario with 7 clusters/segments, the 1st cluster of that scenario and for the feature horeca_count, the mean value is 233.53, the standard deviation is 53.22, the min value is 141 etc.

To start with, the user needs to specify the trade areas of each merchant. This is done using the GENERATE_TRADE_AREAS function from the . The inputs to the function are:

In this step, the trade areas from Step 1 need to be enriched with the relevant spatial information to then analyze the relationship amongst them. The user can either use preprocessed data for each location, enrich the trade areas using the user’s own proprietary data, or enrich them with third-party data from CARTO’s subscriptions. This step is done with the CUSTOMER_SEGMENTATION_ANALYSIS_DATA procedure.

This table contains the descriptive statistics for each feature. A row corresponds to a feature. The table schema is exactly the same as the one from the function of Python Pandas package. Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

In this step, the enriched table from step 2, cartobq.docs.cpg_product_launch_bay_area_step_2_enrich, is used for segmenting the different merchants by means of the clustering algorithm. The user needs to define whether Principal Component Analysis (PCA) should be used or not, by specifying the pca_explainability_factor. In this case, it is set at 0.9. In addition, the user defines the clustering scenarios to be tested, for example 6 and 7 clusters.

And finally the output of the last table, with the metrics to measure the quality of the clustering (namely, index and within sum of squares) is as follows.

In the map below, the result from the segmentation of the scenario with the 6 clusters can be seen. For a detailed description on how to use the resulting tables and visualization to label clusters based on business terms, please refer to this .

Analytics Toolbox
Data Observatory
describe
KMeans
David Bouldin
blogpost
CARTO’s Analytics Toolbox for BigQuery
blogpost
Advanced difficulty banner