1 of 100

CARTO Academy

Welcome to CARTO Academy! In this site you will find a catalog of tutorials, quick start guides and videos to structure your learning path towards becoming an advanced CARTO user.

Not sure where to start? Check out our recommended learning path !

Working with geospatial data

Building interactive maps

Creating workflows

Advanced spatial analytics

Get help

Working with geospatial data

Geospatial data: the basics

This section of CARTO Academy explores the essential foundations of handling spatial data in the modern geospatial tech stack.

Spatial data encompasses a wide range of information that is associated with geographical locations. This data can represent anything from points on a map to complex geographic features, and it plays a central role in a multitude of applications.

What is location data?
Types of location data

What is location data?

Getting to know the basics

Platforms which deal with spatial data - like CARTO - are able to translate encoded location data into a geographic location on a map, allowing you to visualize and analyze data based on location. This includes mapping where something is, and the space it occupies.

There are two main ways that "location" is encoded.

Geographic Coordinates (Geography): Geographic coordinates, also known as geographic or unprojected coordinates, use latitude and longitude to specify a location on the Earth's curved surface. Geographic coordinates are based on a spherical or ellipsoidal model of the Earth and provide a global reference system.
Projected Coordinates (Geometry): Projected coordinates, also referred to as geometriesy or projected coordinates, utilize a two-dimensional Cartesian coordinate system to represent locations on a flat surface, such as a map or a plane. Projected coordinates result from applying a mathematical transformation to geographic coordinates, projecting them onto a flat surface. This projection aims to minimize distortion and provide accurate distance, direction, and area measurements within a specific geographic region or map projection.

The choice between geographic or projected coordinates depends on the purpose and scale of the analysis. Geographic coordinates are commonly used for global or large-scale analysis, while projected coordinates are more suitable for local or regional analysis where accurate distance, area, and shape measurements are required. Furthermore, web mapping systems may often require your data to be a geography, as these systems often use a global, geographic coordinate system.

Types of location data

Raster, Vector & everything in-between

The two primary spatial data types are raster and vector - but what’s the difference?

Raster data

Raster data is represented as a grid of cells or pixels, with each cell containing a value or attribute. It has a grid-based structure and represents continuous values such as elevation, temperature, or satellite imagery.

Common raster file types

Common file types for raster data include:

GeoTIFF: a popular raster file format with embedded georeferencing.
JPEG, PNG & BMP: ubiquitous image files which can be georeferenced with a World or TAB file. PNG supports lossless compression and transparency, making it particularly useful for spatial visualization.
ASCII: stores gridded data in ASCII text format. Each cell value is represented as a text string in a structured grid format, making it easy to read and manipulate.

You may also encounter: ERDAS, NetCDF, HDF, ENVI, xyz.

Vector data

Vector data represents geographic features as discrete points, lines, and polygons.It has a geometry-based structure in which each element in vector data represents a discrete geographic object, such as roads, buildings, or administrative boundaries. Vector data is scalable without loss of quality and can be easily modified or updated.

Vector data is useful for spatial analysis operations such as overlaying, buffering, and network analysis, facilitating advanced geospatial studies. Vector data formats are also well-suited for data editing, updates, and maintenance, making them ideal for workflows that require frequent changes.

Common vector file types

Shapefiles

Shapefiles are a format developed by ESRI. They have been widely adopted across the spatial industry, but their drawbacks see them losing popularity. These drawbacks include:

Shareability: They consist of multiple files (.shp, .shx, .dbf, etc.) that comprise one shapefile, which can make them tricky for non-experts to share and use.
Limited Attribute Capacity: Shapefiles are limited to a maximum of 255 attributes.
Lack of Native Support for Unicode Characters: This can cause issues when working with datasets that contain non-Latin characters or multilingual attributes.
Lack of Topology Information: Shapefiles do not inherently support topological relationships, such as adjacency, connectivity, or overlap between features.
No Native Support for Time Dimension: No native time field type.
Lack of Direct Data Compression: Shapefiles do not provide built-in compression options, which can result in larger file sizes.

Limited File Size Limitations: Shapefile size is limited to 2 GB.

Other vector file types

GeoJSON (Geographic JavaScript Object Notation): GeoJSON is an open standard file format based on JSON (JavaScript Object Notation). It allows for the storage and exchange of geographic data in a human-readable and machine-parseable format.
KML/KMZ (Keyhole Markup Language): KML is an XML-based file format used for representing geographic data and annotations. It was originally developed for Google Earth but has since become widely supported by various GIS software. KMZ is a compressed version of KML, bundling multiple files together.
GPKG (Geopackage): GPKG is an open standard vector file format developed by the Open Geospatial Consortium (OGC). It is a SQLite database that can store multiple layers of vector data along with their attributes, styling, and metadata. GPKG is designed to be platform-independent and self-contained.
FGDB (File Geodatabase): FGDB is a proprietary vector file format developed by Esri as part of the Esri Geodatabase system.
GML (Geography Markup Language): GML is an XML-based file format developed by the .

Everything in-between

There is a small area in between raster and vector data types, with Spatial Indexes being one of the most ubiquitous data types here.

Spatial Indexes are global grids - in that sense, they are a lot like raster data. However, they render a lot like vector data; each "cell" in the grid is an individual feature which can be interrogated. They can be used for both vector-based analysis (like running intersections and spatial joins) and raster-based analysis (like slope or hotspot analysis).

But where they really excel is in their size, and subsequent processing and analysis speeds. Spatial Indexes are "geolocated" through a reference string, not a long geometry description (like vector data). This makes them small, and quick. So many organizations are now taking advantage of Spatial Indexes to enable highly performant analysis of truly big spatial data. Find out more about these in the ebook

Introduction to Spatial Indexes

Scale your analysis with Spatial Indexes

Spatial Indexes - sometimes referred to as Data Cubes or Discrete Global Grid Systems (DGGs) - are global grid systems which tessellate the world into regular, evenly-shaped grid cells to encode location. They are available at multiple resolutions and are hierarchical, with resolutions ranging from feet to miles, and with direct relationships between “parent”, “child” and “neighbor” cells.

They are gaining in popularity as a support geography as they are designed for extremely fast and performant analysis of big data. This is because they are geolocated by a short reference string, rather than a long geometry description which is much larger to store and slower to analyze.

To learn more about Spatial Indexes you can get a copy of our free ebook Spatial Indexes 101.

Already a Spatial Indexes expert?

Skip ahead to the tutorials and boost your Spatial Index expertise to the next level!

Spatial Indexes: the fundamentals

The advantages of working with Spatial Indexes
Choosing an index type
Choosing a resolution

The advantages of working with Spatial Indexes

Choosing an index type

So far, we’ve spoken about Spatial Indexes as a general term. However, within this there are a number of index types. In this section, will cover three main types of Spatial Indexes:

H3
Quadbin
S2
Which Spatial Index should I use?

H3

H3 is a hexagonal Spatial Index, availaIble at 16 different resolutions, with the smallest covering an average area of 0.9m2, reaching up to and 4.3 million km2 at the largest resolution. Unlike standard hexagonal grids, H3 maps the spherical earth rather than being limited to a smaller plan of an area.

H3 has a number of advantages for spatial analysis over other Spatial Indexes, primarily due to its hexagonal shape - which is the closest of the three to a circle:

The distance between the centroid of a hexagon to all neighboring centroids is the same in all directions.
The lack of acute angles in a regular hexagon means that no areas of the shape are outliers in any direction.
All neighboring hexagons have the same spatial relationship with the central hexagon, making spatial querying and joining a more straightforward process.
Unlike square-based grids, the geometry of hexagons is well-structured to represent curves of geographic features which are rarely perpendicular in shape, such as rivers and roads.
The “softer” shape of a hexagon compared to a square means it performs better at representing gradual spatial changes and movement in particular.

Moreover, the widespread adoption of H3 is making it a great choice for collaboration.

However, there may be some cases where an alternative approach is optimal.

Quadbin

Quadbin is an encoding format for Quadkey, and is a square-based hierarchy with 26 resolutions.

At the most coarse level, the world is split into four quadkey cells, each with an index reference such as “48a2d06affffffff.” At the next level down, each of these is further reaching the most detailed resolution which measures less than 1m2 at the equator. This system is known as a quadtree key. The rectangular nature of the Quadbin system makes it particularly suited for modeling perpendicular geographies, such as gridded street systems.

S2

Finally, we have S2; a hierarchy of quadrilaterals ranging from 0 to 30, the smallest of which has a resolution of just 1cm2. The key differentiator of S2 is that it represents data on a three-dimensional sphere. In contrast, both H3 and Quadbin represent data using the Mercator coordinate system which is a cylindrical coordinate system. The cylindrical technique is a way of representing the bumpy and spherical (ish!) world on a 2D computer screen as if a sheet of paper were wrapped around the earth in a cylinder. This means that there is less distortion in S2 (compared to H3 and Quadbin) around the extreme latitudes. S2 is also not affected by the “break” at 180° longitude.

Which Spatial Index should I use?

As we mentioned earlier, H3 has a number of advantages over the other index types and because of this, it is fairly ubiquitous. However, before you decide to move ahead with H3, it’s important to ask yourself the following questions which may affect your decision.

What is the geography of what I’m modeling? This is particularly pertinent if you’re modeling networks. In some cases, the geometry of hexagons is less appropriate for modeling perpendicular grids, particularly where lines are perpendicular with longitude as there is no “flat” horizontal line. If this sounds like your use case, consider using Quadbin or S2.
Where are you modeling? As mentioned earlier, due to being based on a cylindrical coordinate system, both H3 and Quadbin cells experience greater area distortion at more extreme latitudes. However, H3 does have the lowest shape-based distortion at different latitudes. If you are undertaking analytics near the poles, consider instead working with the S2 index which does not suffer from this. Similarly, if your analysis needs to cross the International date Line (180° longitude) then you should also consider working with S2, as both H3 and Quadbin “break” here.
What index type are your collaborators using? It’s worth researching which index your data providers, partners, and clients are using to ensure smooth data sharing, transparency and alignment of results.

Choosing a resolution

The resolution that you work with should be linked to the spatial problems that you’re trying to solve. You can’t answer neighborhood-level questions with cells a few feet wide, and you can’t deal with hyperlocal issues if your cells are a mile across.

For example, if you are investigating what might be causing food delivery delays, you probably need a resolution with cells of around 100-200 yards/meters wide in order to identify problem infrastructure or services.

It’s also important to consider the scale of your source data when making this decision. For example, if you want to know the total population within each index cell but you only have this data available at county level, then transforming this to a grid with a resolution 100 yards wide isn’t going to be very illuminating or representative.

Just remember - the whole point of Spatial Indexes is that it’s easy to convert between resolutions. If in doubt, go for a more detailed resolution than you think you need. It’s easier to move “up” a resolution level and take away detail than it is to move “down” and add detail in.

Learn more about working with Spatial Index "parent" and "children" resolutions in these tutorials.

Keep learning...

Continue your Spatial Indexes journey with the resources below 👇

Spatial Index support in CARTO

Leverage the power of Spatial Indexes in CARTO

How Spatial Indexes are supported in CARTO
Spatial Indexes & our Analytics Toolbox
Visualizing Spatial Indexes in CARTO Builder

How Spatial Indexes are supported in CARTO

As mentioned in the Introduction to Spatial Indexes, Spatial Indexes like H3 and Quadbin have their location encoded with a short reference string or number. CARTO is able to "read" that string as a geographic identifier, allowing Spatial Index features to be plotted on a map and used for spatial analysis.

Spatial Indexes & our Analytics Toolbox

CARTO's Analytics Toolbox is where you can find all of the tools and functions you need to turn data into insights - and Spatial indexes are an important part of this. Whether you are using CARTO Workflows for low-code analytics, or working directly with SQL, some of the most relevant modules include:

H3 or Quadbin modules for creating Spatial Indexes and working with unique spatial properties (e.g. conversion to/from geometries, K-rings).
Data module for enriching Spatial Indexes with geometry-based data .
Statistics module for leveraging Spatial Indexes to employ Spatial Data Science techniques such as Local Moran's I, Getis Ord and Geographically Weighted Regression.
Tiler module for generating tilesets from Spatial Indexes, enabling massive-scale visualizations.

Support for Spatial Indexes may differ depending on which cloud data warehouse you use - please refer to our documentation (links below) for details.

Visualizing Spatial Indexes in CARTO Builder

CARTO Builder provides a lot of functionality to allow you to craft powerful visualizations with Spatial Indexes.

The most important thing to know is that Spatial Index layers are always loaded by aggregation. This means that if you want to use a Spatial Index variable to control the color or 3D extrusion of your layer, you must select an aggregation method such as sum or average. Similarly, the properties for widgets and pop-ups are also aggregated. Because of this, all property selectors will let you select an aggregation operation for each property.

Let's explore the other aspects of visualizing Spatial Indexes!

Visualizing point data as Quadbins

If you add a small point geometry table (<30K rows or 30MB depending on your cloud data warehouse - more information here) to CARTO Builder, you can visualize it as a Quadbin Spatial Index without requiring any processing! By doing this, you can visualize aggregated properties, such as the point count or the sum of numeric variables.

Zoom-based rendering

One of the most powerful features of visualizing Spatial Indexes with CARTO is zoom-based rendering. As the user zooms in further to a map, more detail is revealed. This is incredibly useful for visualizing data at a scale which is appropriate and easy to understand.

Try exploring this functionality on the map below!

Note the maximum, most detailed resolution that can be rendered is the "native" resolution of the Spatial Index table.

Controlling the resolution

With Spatial Index data layers, you can control the granularity of the aggregation by specifying what resolution the Spatial Index should be rendered at. The higher the resolution, the higher the granularity of your grid for each zoom level. This is helpful for controlling the amount of information the user sees.

Note the maximum, most detailed resolution you can visualize is the "native" resolution of the table.

Learn more about styling your maps in our documentation.

Scaling common geoprocessing tasks with Spatial Indexes

So, you've decided to start scaling your analysis using Spatial Indexes - great! When using these grid systems, some common spatial processing tasks require a slightly different approach to when using geometries.

To help you get started, we've created a reference guide below for how you can use Spatial Indexes to complete common geoprocessing tasks - from buffers to clips. Once you're up and running, you'll be amazed at how much more quickly - and cheaply - these operations can run! Remember - you can always revert back to geometries if needed.

All of these tasks are undertaken with CARTO Workflows - our low-code tool for automating spatial analyses. Find more tutorials on using Workflows here.

Buffer

The humble buffer is one of the most basic - but most useful - forms of spatial analysis. It's used to create a fixed-distance ring around an input feature.

With geometries... use the ST Buffer tool.
With Spatial Indexes... convert the input geometry to a Spatial Index, then use a H3/Quadbin K-Ring component to approximate a buffer. Lookup H3 resolutions here and Quadbin resolutions here to work out the K-Ring size needed.

Clip/intersect

Where does geometry A overlap with geometry B? It’s one of the most common spatial tasks, but heavy geometries can make this straightforward task a pain.

With geometries... use the ST Intersection tool. This may look like a simple process, but it can be incredibly computationally expensive.
With Spatial Indexes... convert both input geometries to a Spatial Index, then use a Join (inner) to keep only cells which can be found in both inputs.

Difference

For a “difference” process, we want the result to be the opposite of the previous intersection, retaining all areas which do not intersect.

With geometries... use the ST Difference tool. Again, while this may look straightforward, it can be slow and computationally expensive.
With Spatial Indexes... again convert both input geometries to a Spatial Index, this time using a full outer Join. A Where component can then be used to filter only "different" cells (where h3 IS null AND h3_joined IS not null) - at a fraction of the calculation size.

Spatial Join

Spatial Joins are the "bread and butter" of spatial analysis. They can be used to answer questions like "how many people live within a 10-minute drive of store X?" or "what is the total property value in this flooded area?"

Our Analytics Toolbox provides a series of Enrichment tools which make these types of analyses easy. Enrichment tools for both geometries and Spatial Indexes are available - but we've estimated the latter of these are up to 98% faster!

With geometries... use the Enrich Polygons component.
With Spatial Indexes... use the Enrich H3 / Quadbin Grid component.

Check out the full guide to enriching Spatial Indexes here.

Aggregate within a distance

Say you wanted to know the population within 30 miles of

For instance, in the example below we want to create a new column holding the number of stores in a 1km radius.

With Geometries... create a Buffer, run a Spatial Join and then use Group by to aggregate the results.
With Spatial Indexes... have the inputs stored as a H3 grid with both the source and target features in the same table. Like in the earlier Buffer example, use the H3 K-Ring component to create your "search area." Now, you can use the Group by component - grouping by the newly created H3 K-Ring ID - to sum the number of stores within the search area.

This is a fairly simple example, but let's imagine something more complex - say you wanted to calculate the population within 30 miles of a series of input features. Creating and enriching buffers of this size - particularly when you have tens of thousands of inputs - will be incredibly slow, particularly when your input data is very detailed. This type of calculation could take hours - or even days - without Spatial Indexes.

Next up...

Using Spatial Indexes for analysis

Further tutorials for running analysis with Spatial Indexes

Featured resources

These resources have been designed to get you started. They offer an end-to-end tutorial for creating, enriching and analyzing Spatial Indexes using data freely available on the CARTO platform.

Spatial Statistics

For your use case

The modern geospatial analysis stack

In the past few years, geospatial technology has fundamentally changed. Data is getting bigger, faster, and more complex. User needs are changing too, with an increasing number of organizations and business functions adopting data-centric decision making, leading to a broader range of users undertaking this kind of work. Geospatial cannot any longer be left on a silo.

In this rapidly evolving landscape, the traditional desktop-based Geographic Information Systems (GIS) of the past have given way to a new way of doing spatial analysis, focused on openness and scalability over proprietary software and desktop analytics. This new way of working with geospatial data is supported by a suite of cloud-native tools and technologies designed to handle the demands of contemporary data workflows - this is what we call the modern geospatial analysis stack.

To learn more about the modern geospatial analysis stack you can get a copy of our free ebook .

This shift to more open and scalable geospatial technology offers a range of benefits for analysts, data scientists and the organizations they work for:

Interoperability between different data analysis teams working on a single source of truth database in the cloud.
Scalability to analyze and visualize very large datasets.
Data security backed by the leading cloud platforms.
Democratization & Collaboration with tools that have been esigned to lower the skills barrier for spatial analysis.

However, while the modern geospatial analysis stack excels in offering scalable and advanced analytical and visualization capabilities for your geospatial big data, there are some data management tasks - like geometry editing over georeferenced images - for which traditional open-source desktop GIS tools are great solutions for.

This section of the CARTO Academy will share how you can complement your modern geospatial analysis stack - based on CARTO and your cloud data warehouse of choice - with other GIS tools to ensure all your geospatial needs and use-cases are covered, from geometry editing to advanced spatial analytics and app development.

Building interactive maps

Data sources & map layers

When you begin a new map in CARTO Builder, the left panel is your starting point, providing the tools to add data sources that will be visualized as layers on your map. In Builder, each data source creates a direct connection to your data warehouse, allowing you to access your data without the need to move or copy it. This cloud-native approach ensures efficient and seamless integration of large datasets.

Once a data source is added, CARTO's advanced technology renders a map layer that visually represents your data, offering smooth and scalable visualization, even with extensive datasets.

In this section, we'll take you through the various data source formats that CARTO Builder supports. We'll also explore the different types of map layers that can be rendered in Builder, enhancing your understanding of how to effectively visualize and interact with your geospatial data.

Data sources

Builder data sources can be differentiated in the following geospatial data types:

Simple features: These are unaggregated features using standard geometry (point, line or polygon) and attributes, ready for use in Builder. These spatial and non-spatial attributes are ready to be used in Builder.
Aggregated features based on Spatial Indexes: These data sources are aggregated for improved performance or specific use cases. The properties of these features are aggregated according to the chosen aggregation type in Builder. CARTO currently supports two different types of utilize a spatial indexes, Quadbin and H3.
Pre-generated tilesets: These are tilesets that have been previously pre-generated using CARTO Analytics Toolbox procedure and stored directly in your data warehouse. Ideal for handling very large, static datasets, these tilesets ensure efficient and high-performance visualizations.
Raster: Raster sources uploaded to your data warehouse using CARTO raster-loader, allowing both analytics and visualization capabilities.

Adding sources to Builder

In Builder, you can add data sources either as table sources, by connecting to a materialized table in your data warehouse, or through custom SQL queries. These queries execute directly in your data warehouse, fetching the necessary properties for your map.

Table sources

You can directly connect to your data warehouse table by navigating through the mini data explorer. Once your connection is set, the data source is added as a map layer to your map.

SQL query sources

You can perform a custom SQL query source that will act as your input source. Here you can select the precise columns for better performance and customize your analyses according to your need.

Best practices for SQL Query sources

SQL Editor is not designed for conducting complex analysis or detailed step-by-step geospatial analytics directly, as Builder executes a separate query for each map tiles. For analysis requiring high computational power, we recommend two approaches:
- Materialization: Consider materializing the output result of your analysis. This involves saving the query result as a table in your data warehouse and use that output table as the data source in Builder.
- Workflows: Utilize for conducting step-by-step analysis. This allows you to process the data in stages and visualize the output results in Builder effectively.

Map layers

Once a data source is added to Builder, a layer is automatically added for that data source. The spatial definition of the source linked to a layer specifies the layer visualization type and additional visualization and styling options. The different layer visualization types supported in Buider are:

Point: Displays as point geometries. Point data can be dynamically aggregated to the following types: grid, h3, heatmap and cluster.
Polygon: Displays as polygon geometries.
Line: Displays as line geometries.
H3: Displays features as hexagon cells.
Grid: Displays features as grid cells.
Raster: Displays data as grid of pixels.

Widgets & SQL Parameters

Builder enhances data interaction and analysis through two key features: and . Widgets, linked to individual data sources, provide insights from map-rendered data and offer data filtering capabilities. This functionality not only showcases important information but also enhances user interactivity, allowing for deeper exploration into specific features.

Meanwhile, SQL Parameters act as flexible query placeholders. They enable users to modify underlying data, which is crucial for updated analysis or filtering specific subsets of data.

Widgets

Widgets, linked to individual data sources, provide insights from map-rendered data and offer data filtering capabilities. This functionality not only showcases important information but also enhances user interactivity, allowing for deeper exploration into specific features.

Add a widget to Builder by clicking "New Widget" and select your data source.

Then, select a widget type from the menu: Formula, Category, Histogram, Range, Time Series or Table.

Once you have selected the widget type of your preference, you are ready to configure your Widget.

In the Data section of the Widget configuration, choose an aggregation operation COUNT, AVG, MAX, MIN or SUM and, if relevant, specify the column on which to perform the aggregation.

Using the Formatting option, you can auto-format data, ensuring enhanced clarity. For instance, you can apply automatic rounding, comma-separations, or percentage displays.

You can use Notes to supplement your Widgets with descriptive annotations which support , allowing to add text formatting, ordered lists, links, etc.

Widgets in Builder automatically operate in viewport mode, updating data with changes in the viewport. You can also configure them for global mode to display data for the entire source.

Furthermore, Widgets can be set as collapsible for convenient hiding. Some widgets have the capability to filter not only themselves but also related widgets and connected layers. This filtering capability can be easily enable or disable for each widget using the cross-filtering icon.

SQL Parameters

SQL Parameters serve as placeholders in your SQL Query data sources, allowing viewer users to input specific values that dynamically replace these placeholders. This allows users to interactively customize and analyze the data displayed on their maps.

SQL Parameters are categorized based on the data format of the values expected to be received, ensuring flexibility and ease of use. Below are the current type of SQL Parameters:

: Ideal for handling date values, date parameters allow users to input a specific date range, enabling data analysis over precise time periods. For example, analyzing sales data for a specific month or quarter.
: Tailored for text values, users can input or select a specific category to obtain precise insights. For instance, filtering Points of Interest (POI) types like "Supermarket" or "Restaurant".
: Designed for numeric values, users can input specific numerical criteria to filter data or perform analysis based on their preferences. For example, updating the radius size of a geofence to update an analysis result.

Using SQL Parameters

SQL Parameters can be used in many different ways. One of the most common is allowing viewers to interact with the data in a controlled manner. Let's cover a simple use case step by step:

Add a SQL Query data source

The option to create a new SQL Parameter will be available once there is at least one data source of type Query:

So, let's create a SQL Query data source with a table that contains information about fires all over the world:

On a new map, click on 'Add source from...' and select 'Custom query (SQL)' .
Select CARTO Data Warehouse as connection.
Use the following query

Create and configure a text parameter

Once we have the data rendered in the map, we'll add a that helps us select between fires that happened during the day or the night.

Click on 'Create a SQL Parameter'
Select 'Text Parameter'
In the 'Values' section, click on 'Add from source'. Select your data source and pick the daynight column
In the 'Naming' section, pick a display name, like 'Day/Night'. The SQL name gets automatically generated as {{day_night}}
After the parameter has been created, open the SQL panel and add it to your query:

You can now use the control UI to add/remove values and check how the map changes.

Create and configure a date parameter

Now, let's add a to filter fires by its date:

Click on 'Create a SQL parameter'
Select 'Date parameter'
Type or select from a calendar the range of dates that are going to be available from the control UI.
Give it a display name, like 'Date'. The SQL names gets automatically generated as {{date_from}} and {{date_to}}

Open the SQL Panel and add the parameters to your query, like:

The parameters {{date_from}} and {{date_to}} will be replaced by the dates selected in the calendar.

Create and configure a numeric parameter

Next, we'll incorporate a range slider to introduce a . It will allow users to focus on fires based on their brightness temperature to identify the most intense fires.

Click on 'Create a SQL parameter'
Select 'Numeric parameter'
In the 'Values' section, select Range Slider and enter the 'Min Value' and 'Max Value' within the range a user will be able to select.
Give it a display name, like 'Bright Temp'. The SQL names gets automatically generated as {{bright_temp_from}} and {{bright_temp_to}}

Open the SQL Panel and add the parameters to your query, like:

Data visualization

In this section you can find step-by-step guides focused on bringing your data visualization to life with Builder. Each tutorial utilizes available demo data from the CARTO Data Warehouse connection, enabling you to dive straight into map creation right from the start.

Customize your visualization with tailored-made basemaps

Context

The basemap is the foundational component of a map. It provides context, geographic features, and brand identity for your creations. Every organization is unique, and CARTO allows you to bring your own basemaps to fit your specific needs.

In this tutorial, you'll learn to customize your visualizations in Builder by using tailor-made basemaps. Don't have a custom basemap already? We'll start with the creation of a custom basemap using Maputnik, a free and open-source visual editor.

Prerequisites: You need to be an Admin user to add custom basemaps to your CARTO organization.

In this guide, we'll walk you through:

Creating a Style JSON in Maputnik
Hosting a Style JSON using Github Gist
Adding your custom basemaps to CARTO
Creating a map using your custom basemap

Creating a Style JSON using Maputnik

Access the online version of Maputnik at https://maplibre.org/maputnik/. Then, click "Open" and select "Zoomstack Night." Zoomstack Night is an open vector basemap provided by Ordnance Survey's OS Open Zoomstack, showcasing coverage of Great Britain.

You might get overwhelmed by all the options available in the UI, but using it is simpler than it seems. To make it easier to recognize the different items you can update in the style, simply click on the map, and Maputnik will display the layers you can customize.

Now that you're more familiar with this tool, let's start customizing the look and feel of this map.

Set the "buildings" layer to blue using this hex color code #4887BD.

For the green spaces, set the "greenspaces" layer to #09927A and "woodland" to #145C42.

To highlight the visualization of both "greenspace names" and "woodland names" labels, increase the size using the below JSON code and set the fill color to white.

Once you're done, export the Style JSON and save it. You'll need this for the next section. Note depending on which style you have used as a template, you may need to include an access token at this point, such as from MapTiler.

Hosting a Style JSON in Github

In this section, we'll showcase how you can host Style JSON files using GitHub to consume them in your CARTO organization. We'll be using a feature called gist, which allows you to host files. Here’s how to do it:

Ensure you have access to GitHub and your own repository and create a new gist. To do so:
- Go to GitHub and create a new gist.
- Drag your exported Style JSON into the gist.
- Make sure the gist is public.
- Create the public gist.

Now we'll get the raw URL of the hosted Style JSON, to do so:
- Access the raw version of the gist.
- Copy the URL of the raw file. This URL will be used to consume the custom basemap in CARTO.

Adding custom basemaps to your organization

Note: You need to be the Admin of your organization to have the rights to add custom basemaps to your CARTO organization.

Go to Organization > Settings > Customizations > Basemaps

Click on "New basemap" to add your custom basemap, completing the following parameters:
- URL: Enter the gist raw URL of the hosted Style JSON.
- Name: The name you'd like to provide to your basemap
- Attribution: Automatically filled but you can edit this if required.
Once the basemap URL has been validated, you can use the interactive map to navigate to the desired basemap extent.

Activate the custom basemap type in the "Enabled basemaps in Builder" section. Doing so, you'll enable all Editors of the organization to access all added custom basemaps.

Creating a map using your custom basemap

Navigate to the Maps section and click on "New map".

Provide the map with a title "Using custom basemaps" and load Bristol traffic accidents source. To do so:
- Click on "Add sources from..."
- Navigate to CARTO Data Warehouse > demo data > demo_tables.
- Select "bristol_traffic_accidents" table.
- Click "Add source".

The source and related layer is added to the map.

Rename the newly added layer "Traffic Accidents".
Go to the Basemap tab and choose your recently uploaded custom basemap.

Style the "Traffic Accidents" layer:
- In the Fill Color section, set the color to light yellow.
- Configure the Size to 4.

Now, you're done with your map creation and ready to share it with others!

Data analysis

In this section, you can explore our step-by-step guides designed to enhance your data analysis skills using Builder. Each tutorial features demo data from the CARTO Data Warehouse connection, allowing you to jump directly into creating and analyzing maps.

Sharing and collaborating

Enhance your sharing and collaborating skills with Builder through our detailed guides. Each tutorial, equipped with demo data from the CARTO Data Warehouse, showcases how Builder facilitates the sharing and collaboration of insights, ensuring ease of understanding and effective communication in your maps.

Solving geospatial use-cases

Explore a range of tutorials in this section, each designed to guide you through solving various geospatial use-cases with Builder and the wider CARTO Platform. These tutorials leverage available demo data from the CARTO Data Warehouse connection, enabling you to dive straight into map creation right from the start.

Build a store performance monitoring dashboard for retail stores in the USA

Context

In this tutorial, you’ll learn how to use CARTO Builder to create an interactive dashboard for visualizing and analyzing retail store performance across the USA. We’ll create two types of layers; one displaying stores in their original geometry using bubbles and another using point geometry aggregated to Spatial Indexes, all easily managed through the CARTO UI.

Thanks to this interactive map, you’ll effortlessly identify performance trends and pinpoint the most successful stores where revenue is inversely correlated with surface area. Are you ready to transform your data visualization and analysis skills? Let's dive in!

Step-by-Step Guide:

Access the Maps from your CARTO Workspace using the Navigation menu and create a "New map".

Let's add retail stores as the first data source.
- Select the Add source from button at the bottom left on the page.
- Click on the CARTO Data Warehouse connection.
- Select Type your own query.
- Click on the Add Source button.

The SQL Editor panel will be opened.

To add retail stores source, run the query below:

Change the layer name to "Retail Stores". Click over the layer card to start styling the layer.

Access more Options in the Fill Color section and apply “Color based on" using Size_m2 column. Pick a gradient palette (versus one for a categorical variable), and set the gradient steps to 4.

Now click on the options for the Radius configuration and in the section “Radius Based On” pick the column Revenue. Play with the minimum/maximum size to style the layer as you like.

Now that you have styled "Retail stores" layer, you should have a map similar to the below.

Go to Widget tab, click on New Widget button and select your SQL Query data source.

First, we create a for the Total Revenue. Select the SUM operation on the revenue field, adjusting the output value format to currency. Add a note to indicate we are calculating revenue shown in the viewport. Rename to “Total Revenue”:

Next, we will create a widget to filter by store type. Select , choose COUNT operation from the list and select the column storetype. Make the widget collapsible and rename it to “Type of store”.

Then, we create a third widget, a to filter stores by revenue. Set the buckets to 10, formatting to currency, and make widget collapsible. Rename to “Stores by revenue”.

Now let’s configure the tooltip. Go to Interactions tab, activate the tooltip and select the field Storetype, Address, City, State, Revenue and Size_m2.

Let’s also change our basemap. Go to Basemaps tab and select “Voyager” from CARTO.

Now, we will upload the same data source using SQL Query type and this time we will dynamically aggregate it to Quadbin Spatial Indexes using the UI. To do so, run the following query:

The new layer will appear. Rename the layer to "Retail stores (H3)" and using the Layer panel, aggregate it to Quadbin.

Change the order of your layer by dragging it after your point "Retail store" layer. In the layer panel, set the Spatial Index resolution to 7 and style it based on Revenue using SUM as the aggregation.

Finally, set the height of the hexagons to be based on Size_m2, multiplying by 20 using Linear es the color scale. Then, set the map view to 3D to analyze the results.

Enable map dual view. On the left map disable the "Revenue store (H3)" grid layer, on the right map disable the "Retail stores" layer.

As we can see, in metro areas in the west coast we have more stores of lower surface area, yet their revenues are much higher than rural areas, where we have stores with higher surface areas.

Switch back to single map view mode. Hide the "Retail stores (H3)" layer. Rename the map to “Monitor retail store performance” and add a rich description using .

We can make the map public and share it online with our colleagues. For more details, see .

Finally, we can visualize the result.

Agentic GIS

AI Agents

CARTO AI Agents provide a powerful conversational interface that allows anyone, regardless of technical expertise, to ask questions in natural language and receive instant, actionable insights. This marks a fundamental shift beyond dashboards to a dynamic, intuitive way of exploring your geospatial data.

You can create agents directly in Builder and link them to your maps. This transforms static maps into interactive experiences where end-users can ask questions, explore data, and extract insights through conversation.

What is an AI Agent?

An AI Agent is a system powered by a large language model (LLM) that can interact with your data and tools to answer questions. Unlike a simple chatbot, agents can decide which tools to use, analyze results, and take multiple steps to solve a problem.

Agents have three components:

Instructions: Define what the agent does, what it knows, and how it should behave. This is where you specify its purpose and expertise.
Tools: The agent can access CARTO's built-in geospatial tools and your custom MCP tools for connecting to other systems.
Model: The LLM that powers the agent's ability to understand questions and decide which tools to use.

Enable AI Agents in your organization

AI Agents are disabled by default. To enable them, navigate to Settings > CARTO AI and toggle Enable CARTO AI to enable them for your whole organization. Once enabled, any Editor user can create AI Agents in Builder maps.

To enable AI Agents in your organization you must be an Admin user.

Set up an AI Agent in Builder

Once AI Agents are available in your organization, you can start the creation of Agents directly in Builder. To start, create a new map or open an existing one. Then navigate to the AI Agents tab on the left pane and click Create agent. This will open the agent configuration dialog.

The Use Case field is required. Use it to explain what the map is for and what questions users will ask. For example, "This map shows retail locations. Users want to find stores near them and compare performance across regions." This helps the agent deliver relevant, accurate answers. Once you've filled in the Use Case, you're ready to test your agent. Everything else is optional.

You can also add custom instructions for more specific guidance. This is optional but recommended. Use it to add domain knowledge, define response style, or set boundaries on what the agent should and shouldn't do.

To enhance the user experience, you can set a welcome message that greets users when they open the agent, and add conversation starters—preset questions users can click to begin interacting with the agent.

Once you're ready, click Create Agent. This makes it available to Editors in your organization. To make it available for Viewers, toggle the setting in Map settings for viewers on the top banner. This will also make the agent available to anyone with the map link if the map is public.

CARTO MCP Server

MCP Tools let you expose Workflows as tools that AI Agents can use. This means you can build custom geospatial operations in Workflows and make them available to any MCP-compliant agent.

For example, you could create a Workflow that finds optimal delivery routes, then expose it as an MCP Tool. An agent like Gemini CLI could then call that tool automatically when someone asks a routing question.

The CARTO MCP Server enables AI Agents to use geospatial tools built with Workflows. By exposing workflows as MCP Tools, GIS teams can empower agents to answer spatial questions with organization-specific logic. Each tool follows the MCP specification, including a description, input parameters, and output, making them accessible to any MCP-compliant agentic application.

How it works:

Build a Workflow that solves a specific problem
Configure it as an MCP Tool (add descriptions, inputs, and outputs)
Connect an agent to your CARTO MCP Server
The agent can now use your custom tools

Step 1: Create a Workflow

Each MCP Tool needs a Workflow behind it. Design workflows that solve the specific questions you want agents to answer. For detailed instructions on building Workflows as MCP Tools, see the .

Step 2: Create an API Access Token

The MCP Server uses API tokens for authentication.

In the CARTO Workspace, navigate to Developers > Credentials and create a new API Access Token
Under Allowed APIs, select the MCP Server permission
Copy the token and save it securely

You'll need this token to connect agents to your MCP Server.

Step 3: Connect an Agent

Once your workflow and token are ready, connect your agent to the CARTO MCP Server. Here's an example using Gemini CLI:

Best Practices

Write clear tool descriptions Explain what the tool does and when to use it. This helps agents choose the right tool for each question.

Define inputs precisely Use descriptive parameter names and types. Vague labels confuse agents.

Test workflows first Run workflows manually before exposing them as tools. Verify the outputs match what you expect.

Choose the right output mode Use Sync for quick queries. Use Async for long-running operations. Keep in mind that Async requires the agent to poll for status and fetch results when complete, which may need additional prompt engineering.

Keep tools updated When you modify a workflow, sync it promptly. Let users know if tool behavior changes.

Monitor usage Review how tools are used and check for errors. Use this to refine workflows or improve descriptions.

Bear in mind that with Async mode, the agent will need to poll for the status of the execution and make an additional query to get results when the job is finalized. Implementing this logic in your agent's prompt might require additional work.

Workflows as MCP Tools

MCP Tools in CARTO are built from Workflows. Each tool you create defines how to solve a specific spatial problem, what inputs it needs, and what results it returns.

This guide shows you how to configure a Workflow as an MCP Tool.

Step 1: Create a Workflow

Build a Workflow that solves the specific problem you want agents to handle. For example, if you want agents to find nearby stores, create a workflow that performs that spatial query.

Step 2: Add an MCP Tool Output

Add the MCP Tool Output component to your workflow. This defines what the tool returns.

Choose the output mode:

Sync: Returns results immediately (use for fast queries)
Async: Returns results after processing completes (use for long-running operations)

Step 3: Configure the tool

Click the three-dot menu in the top-right corner and select the MCP Tool settings.

When the dialog opens up, write a clear description explaining what the tool does. Make sure to also. Define all input parameters with descriptive names and descriptions and enable the tool to make it available through the MCP Server.

Example description: "Finds the 5 nearest retail stores to a given location and returns their addresses and distances."

Step 4: Sync changes

When you update a workflow, click Sync to propagate changes to the MCP Tool. This ensures agents always use the latest version.

Step 5: Use your own tools

Once your workflow is configured as an MCP Tool, you can:

Connect external agents (like Gemini CLI) to your CARTO MCP Server. See Using MCP Tools with CARTO.
Give CARTO AI Agents access to the tool. See Adding MCP Tools to AI Agents.

Step-by-step tutorials

In this section you can find a set of tutorials with step-by-step instructions on how to solve a series of geospatial use-cases with the help of Agentic GIS.

Creating workflows

A no-code approach to optimizing OOH advertising locations

In this webinar we leverage Spatial Indexes along with human mobility and spend data to optimize locations for OOH billboards in a low-code environment thanks to CARTO Workflows. While this example focuses on OOH, the approach could be utilized in other sectors such as CPG, retail and telecoms.

Identifying customers potentially affected by an active fire in California

In this example we will see how we can identify customers potentially affected by an active fire in California using CARTO Workflows. This approach is one of the building blocks of spatial analysis and can be easily adapted to any use case where you need to know which features are within a distance of another feature.

All of the data that you need can be found in the CARTO Data Warehouse (instructions below).

To begin, click on "+ New workflow" in the main page of the Workflows section. If it will be your first workflow, click on "Create your first workflow".

To begin, click on "+ New workflow" in the main page of the Workflows section. If it will be your first workflow, you will instead see the option to "Create your first workflow".

From here, you can drag and drop data sources and analytical components that you want to use from the explorer on the left side of the screen into the Workflow canvas that is located at the center of the interface.

Let's add the usa_states_boundaries data table into our workflow from the demo_tables dataset available in the CARTO Data Warehouse connection. You can find this under Sources > Connection > demo data > demo_tables.

Then filter only the boundary of the state of California using the Simple Filter component; set the column as name, the operator as equal to and the value as California.
Run your workflow!

You can run the workflow at any point in this tutorial - only new or edited components will be run, not the entire workflow. You can also just wait to run until the end.

Next, let's explore fires in this study area.

From the same location that you added usa_states_boundaries, add fires_worldwide to the canvas. For ease later, you'll want to drop it just above the Simple Filter component from the previous step.
Next, add a Spatial Filter component to filter only the fires that fall inside the digital boundary of the state of California. Connect fires_worldwide to the top input and Simple Filter to the bottom. Specify both geo columns as "geom" and the spatial predicate as intersect (meaning the filter will apply to all features where any part of their shape intersects California).
To keep your workflow well organized, use the Add a note (Aa) tool at the top of the window to draw a box around this section of the workflow. You can use any markdown syntax to format this box - our example uses ## Fires in California.

Now, use the ST Buffer component to generate a 5 km radius buffer around each of the active fires in California.

Next, add third data source with a sample of customer data from an illustrative CRM system. You can find it as customers_geocoded in demo_tables inside your CARTO Data Warehouse.
Now let’s add another Spatial Filter component to know which of our customers live within the 5 km buffer around the active fires and thus could potentially be affected.

You'll notice we now have a couple of instances of duplicated records where these intersect multiple buffers. We can easily remove these with a Remove duplicated component. Now is also a great time to add a second note box to your workflow, this time called ## Filter customers.

You can explore the results of this analysis at the bottom panel of the window, via both the Data and Map tabs. From the map tab, you can select Create map to automatically create a map in CARTO Builder.

Head to the Data visualization section of the Academy next to explore tutorials for building impactful maps!

Finding stores in areas with weather risks

In this example we use CARTO Workflows to ingest data from a remote file containing temperature forecasts in the US together with weather risk data from NOAA, and data with the location of our stores; we will identify which of the stores are located in areas with weather risks or strong deviations in temperature.

To start creating the workflow, please click on "+ New workflow" in the main page of the Workflows section. If it will be your first workflow, click on "Create your first workflow".

Choose the data warehouse connection that you want to use. In this case, please select the CARTO Data Warehouse connection to find the data sources used in this example.

Now you can drag and drop the data sources and components that you want to use from the explorer on the left side of the screen into the Workflow canvas that is located at the center of the interface.

Now, let's add the noaa_warnings data table into our workflow from the demo_tables dataset available in the CARTO Data Warehouse connection.

After that, let’s add the retail_stores data table from the demo_tables dataset, also available in the CARTO Data Warehouse connection.

Now let's use the SPATIAL_JOIN component to know which of our retail_stores are in the warning areas.

At that point we already have our stores within a NOAA Weather Warning and, if we deem it appropriate, we can send an email to share this warnings to anyone interested in this information using the SEND_BY_EMAIL component.

After that, we can use the IMPORT_FROM_URL component to import the temperature forecast from Climate Prediction Center using this URL in particular to take the latest temperature forecast in a Shapefile: https://ftp.cpc.ncep.noaa.gov/GIS/us_tempprcpfcst/610temp_latest.zip. These data will be consulted again with each execution of the workflow. It means that the results of the workflow will change if the data has been updated.

Now, we are going to drop the geom_joined column to keep only one geom column in order to avoid confusions.

We will proceed to make a new SPATIAL_JOIN in order to have the temperature forecast associated to the stores.

Finally, we conclude with this example saving the outcome in a new table using SAVE_AS_TABLE component. Remember that you should specify the fully qualified name of the new dataset in the field of this component.

We can use the "Create map" button in the map section of the Results panel to create a new Builder map and analyze the results in a map.

How to run scalable routing analysis the easy way

In this webinar we showcase how to run scalable routing analysis directly inside your cloud data warehouse by building a workflow that leverages our support for calling external routing services with the Create Routes component.

Geomarketing techniques for targeting sportswear consumers

In this webinar we showcase how to implement with workflows geomarketing techniques to help businesses target sportsfans & sportswear consumers.

How to use GenAI to optimize your spatial analysis

In this webinar we showcase how to leverage the ML Generate Text component in Workflows to optimize and help us understand the results of a spatial analysis.

Analyzing origin and destination patterns

This tutorial leverages the Spatial Index H3 to visualize origin and destination trip patterns in a clear, digestible way. We'll be transforming 2.5 million origin and destination locations into one H3 frequency grid, allowing us to easily compare the spatial distribution of pick up and drop off locations. This kind of analysis is crucial for resource planning in any industry where you expect your origins to have a different geography to your destinations.

You can use any table which contains origin and destination data - we'll be using the NYC Taxi Rides demo table which you can find in the CARTO Data Warehouse (BigQuery) or the CARTO Academy Data listing on the Snowflake Marketplace.

Step-by-Step tutorial

Creating a Workflow

In the CARTO Workspace, head to Workflows and Create a Workflow, using the connection where your data is stored.
Under Sources, locate NYC Taxi Rides (or whichever input dataset you're using) and drag it onto the workflow canvas).

#1 Filtering trips to a specific time period

When running origin-destination analysis, it's important to think about not only spatial but temporal patterns. We can expect to see different trends at different times of the day and we don't want to miss any nuances here.

Connect NYC Taxi Rides to a Spatial Filter component.
Set the filter condition to PART_OF_DAY = morning (see screenshot above). You can pick any time period you'd like; if you select the NYC Taxi Rides source, open the Data preview and view Column Stats (histogram icon) for the PART_OF_DAY variable, you can preview all of the available time periods.

Note we've started grouping sections of the workflow together with annotation boxes to help keep things organized.

#2 Convert origins and destinations to a H3 frequency grid

The 2.5 million trips - totalling 5 million origin and destination geometries - is a huge amount of data to work with, so let's get it converted to a Spatial Index to make it easier to work with! We'll be applying the straightforward approach from the Convert points to a Spatial Index tutorial.

Connect the match output of the Simple Filter to a H3 from GeoPoint component and change the points column to PICKUP_GEOM; which will create a H3 cell for each input geometry. We're looking for junction and street level insights here, so change the resolution to 11.
Connect the output of this to a Group by component. Set the Group by column to H3 and the aggregation column to H3 (COUNT). This will count the number of duplicate H3 IDs, i.e. the number of points which fall within each cell.
Repeat steps 1 & 2, this time setting the initial points column to DROPOFF_GEOM.
Add a Join component and connect the results of your two Group by components to this. Set the join type to Full Outer; this will retain all cells, even where they don't match (so we will retain a H3 cell that has pickups, but no dropoffs - for instance).

Now we have a H3 grid with count columns for the number of pick ups and drop offs, but if you look in the data preview, things are getting a little messy - so let's clean them up!

#3 Data cleaning

Create Column: at the moment our H3 index IDs are contained in two separate columns; H3 and H3_JOINED. We want just one single column containing all IDS, so let's create a column called H3_FULL and use the following CASE statement to combine the two: CASE WHEN H3 IS NULL THEN H3_JOINED ELSE H3 END.
Drop Columns: now we can drop both H3 and H3_JOINED to avoid any confusion.
Rename Column: now, let's rename H3_COUNT as pickup_count and H3_COUNT_JOINED as dropoff_count to keep things clear.

Now, you should have a table with the fields H3_FULL, pickup_count and dropoff_count, just like in the preview above!

#4 Normalize & Compare

Now, we can compare the spatial distribution of pickups and dropoffs:

Connect two subsequent Normalize components, first normalizing pickup_count, and then dropoff_count. This will convert the raw counts into scores from 0 to 1, making a relative comparison possible.
Add a Create Column component, and calculate the difference between the two normalized fields (pickup_count_norm - dropoff_count_norm). The result of this will be a score ranging from -1 (relatively more dropoffs) to 1 (relatively more pickups).

You can see the full workflow below.

Check out the results below!

Do you notice any patterns here? We can see more drop offs in the business district of Midtown - particularly along Park Avenue - and more pick ups in the more residential areas such as the Upper East and West Side, clearly reflecting the morning commute!

Optimizing workload distribution through Territory Balancing

In this tutorial, we’ll explore how to optimize work distribution across teams by analyzing sales territory data to identify imbalances and redesign territories using .

Focusing on a beverage brand in Milan, we’ll use the component, a feature available in the , to evenly assign Point of Sale (POS) locations to sales representatives by dividing a market (a geographic area) into a set of continuous territories. This ensures fair workloads, improves customer coverage, and boosts operational efficiency by aligning territories with demand.

Setting up your workflow

Sign in to CARTO at
Head to the Workflows tab and click on Create new workflow
Choose the CARTO Data Warehouse connection or any connection to your Google BigQuery project.
the extension package if you do not have it already.

Now, let’s dive into the step-by-step process of creating a workflow to balance territories, ensuring each sales representative is assigned a manageable and strategically optimized area. You can access the full template.

Imagine a field sales team responsible for visiting hundreds of restaurants across the greater Milan metropolitan area to promote and distribute a soft drink. When Points of Sale (POS) are assigned manually, territories often become unbalanced—some sales representatives end up with too many accounts to manage effectively, while others are left underutilized. This imbalance may lead to inconsistent store visits and missed sales opportunities. By balancing the POS to visit, Territory Balancing ensures that each sales rep is responsible for a fair and strategically valuable territory.

Loading the POS data

For this use case, we will consider restaurant point of sales (POS) in the city center of Milan from , together with popularity and sentiment scores. The full dataset is available on demand in , but we have prepared a sample for you to easily follow this tutorial.

Once in your workflow, go to the section, and drag-and-drop the component. Then, type in the table’s Fully Qualified Name (FQN): cartobq.docs.territory_balancing_milan_pos.

Using H3 Spatial Indexes

The Territory Balancing component uses a graph partitioning algorithm that splits a gridified area into a set of optimal, continuous territories, ensuring balance according to a specified metric while maintaining internal similarity within each territory. To account for spatial dependencies when defining the graph, we rely on spatial indexes for efficiency and scalability: each node in our graph is defined by an H3 or Quadbin cell (white dots), with edges or connections (white lines) defined by their direct first-degree neighbours.

We'll map each restaurant to its corresponding H3 cell at resolution 9 and aggregate the data by cell. This will allow us to calculate the number of points of sale (POS) within each cell, along with the average popularity and sentiment scores. Use the and the components to do so.

Balancing territories based on POS presence

Lastly, we will use the component to obtain 9 continuous areas of equal (with a tolerance) number of restaurants so that workload is distributed evenly. We will select geom_count (the number of restaurants) as the business KPI to be balanced, the so-called demand. We will also consider the average popularity and sentiment as similarity features, so that H3 cells within the same territory are similar in these metrics, while these averages differ across territories.

In the following map, we can see a comparison between a manual assignment based on administrative regions (Nuclei di Identità Locale, NIL) and an automatic one using territory balancing. The former is very unbalanced, with some territories being highly overloaded while others being assigned just a few restaurants. Using territory balancing techniques, it can be seen how territories are optimally assigned with balanced workload.

Retail and CPG

Estimate population around top performant retail stores

This example demonstrates how to use workflow to filter out the top retail stores that belong to a specific category and computes the population living around them.

Commercial Hotspot Analysis. Identifying Optimal Locations for a New Pizza Place

Identifying an optimal location for a new store is not always an easy task, and we often do not have enough data at our disposal to build a solid model to predict potential revenues across an entire territory. In these cases, managers rely on different business criteria in order to make a sound decision for their expansion strategy. For example, they rely on defining their target market and segmenting population groups accordingly in order to locate the store closer to where the target market lives (e.g. areas with a great presence of youngsters).

In this example, we are going to use the component to explore good locations to open a new Pizza Hut restaurant in Honolulu, Hawaii. We will use H3 as our geographic support and population and distance to existing Pizza Hut stores as our criteria to identify hotspots. For a detailed description of this use case read .

Out Of Home Advertising

Identify best billboards to target a specific audience

This workflow example computes an index in order to analyze what are the best billboards to target a specific audience, then it filters the top 100 best billboards.

Download example

BigQuery ML

For this templates, you will need to install the BigQuery ML extension package.

Create a classification model

CARTO DW

BigQuery

✅

This example shows how to create a pipeline to train a classification model using BigQuery ML, evaluate the model and use it for prediction. In particular, we will create a classification model to estimate customer churn for a telecom company in California.

This example workflow will help you see how telco companies can detect high-risk customers, uncover the reasons behind customer departures, and develop targeted strategies to boost retention and satisfaction by training a classification model.

Download example

Create a regression model

CARTO DW

BigQuery

✅

This example shows how to create a pipeline to train a regression model using BigQuery ML, evaluate the model and use it for prediction. In particular, we will create a regression model to predict the average network speed in the LA area.

This example workflow will help you see how telco companies can improve network planning use by training a regression model to estimate the network speed in areas where no measurements are available.

Download example

Forecast

CARTO DW

BigQuery

✅

This template shows how to create a forecast model using the BigQuery ML extension package for Workflows. There are three main stages involved:

Training a model, using some input data and adjusting to the desired parameters,
Evaluating and understanding the model and its performance,
Predicting to a given horizon and saving the results.

Download example

Import a model

CARTO DW

BigQuery

✅

This example shows how to create a pipeline to import a pre-trained model using BigQuery ML and use it for prediction. In particular, we will import a regression model to predict the ration of crime counts per 1000 population in the Chicago area.

Download example

Snowflake ML

For these templates, you will need to install the SnowflakeML extension package.

Create a classification model

Snowflake

✅

This example shows how to create a pipeline to train a classification model using Snowflake ML, evaluate the model and use it for prediction. In particular, we will create a classification model to estimate customer churn for a telecom company in California.

Download this example

Create a forecasting model

Snowflake

✅

This template shows how to create a forecast model using Snowflake ML through the extension package for Workflows. There are three main stages involved:

Training a model, using some input data and adjusting to the desired parameters,
Evaluating and understanding the model and its performance,
Forecasting and saving the results.

Download this example

Territory Planning

For these templates you will need to install the extension package.

Territory Balancing

BigQuery

CARTO Data Warehouse

In this template, we’ll explore how to optimize work distribution across teams by analyzing sales territory data to identify imbalances and redesign territories.

Focusing on a beverage brand in Milan, we’ll evenly assign Point of Sale (POS) locations to sales representatives by dividing a market (a geographic area) into a set of continuous territories. This ensures fair workloads, improves customer coverage, and boosts operational efficiency by aligning territories with demand.

Location Allocation - Maximize Coverage

BigQuery

CARTO Data Warehouse

Managing a modern telecom network requires balancing cost, coverage, and operational efficiency. Every network node—a set of cell towers—represents demand that must be effectively served by strategically placed facilities.

In this tutorial, we’ll explore how network planners can determine the optimal locations for Rapid Response Hubs, ensuring that each area of the network is monitored and maintained efficiently through Location Allocation. More specifically, we aim to maximize network coverage so that whenever an emergency occurs (i.e. outages, equipment failures, or natural disaster impacts), the nearest facility can quickly respond and restore service.

Location Allocation - Minimize Total Cost

BigQuery

CARTO Data Warehouse

In this example, we’ll explore how network planners can determine the optimal locations for Maintenance Hubs, ensuring that each area of the network is monitored and maintained efficiently through Location Allocation. More specifically, we aim to minimize total operational costs for ongoing inspections and servicing, respecting resource capacities, and ensuring that routine maintenance is delivered cost-effectively. Our goal will be to expand our existing facilities by adding one selected site per county in Connecticut to serve rising network demand.

Advanced spatial analytics

Introduction to the Analytics Toolbox

The CARTO Analytics Toolbox is a suite of functions and procedures to easily enhance the geospatial capabilities available in the different leading cloud data warehouses.

It is currently available for Google BigQuery, Snowflake, Redshift, Databricks and PostgreSQL.

The Analytics Toolbox contains more than 100 advanced spatial functions, grouped in different modules. For most data warehouses, a core set of functions are distributed as , while the most advanced functions (including vertical-specific modules such as retail) are distributed only to CARTO customers.

How does it work

The CARTO Analytics Toolbox is a set of SQL UDFs and Stored Procedures that run natively within each data warehouse, leveraging their computational power and scalability and avoiding the need for time consuming ETL processes.

The functions can be executed directly from the CARTO Workspace or in your cloud data warehouse console and APIs, using SQL commands.

Here’s an example of a query that returns the compact H3 cells for a given region, using Analytics Toolbox functions such as H3_POLYFILL() or H3_COMPACT() from our H3 module.

Check the for each data warehouse (listed below) for a complete SQL reference, guides, and examples as well as instructions in order to install the Analytics Toolbox in your data warehouse.

Spatial Analytics for BigQuery

CARTO's is a set of UDFs and Stored Procedures to unlock Spatial Analytics. It is organized in a set of modules based on the functionality they offer. Visit the to see the full list of available modules and functions. In order to get access to the Analytics Toolbox functionality in your BigQuery please read about the different in our documentation.

Spacetime hotspot classification: Understanding collision patterns

Spatio-temporal analysis is crucial in extracting meaningful insights from data that possess both spatial and temporal components. By incorporating spatial information, such as geographic coordinates, with temporal data, such as timestamps, spatio-temporal analysis unveils dynamic behaviors and dependencies across various domains. This applies to different industries and use cases like car sharing and micromobility planning, urban planning, transportation optimization, and more.

In this example, we will perform a hotspot analysis to identify space-time clusters and classify them according to their behavior over time. We will use the location and time of accidents in London in 2021 and 2022, provided by Transport for London. This tutorial builds upon this previous one, where we explained how to use the spacetime Getis-Ord functionality to identify traffic accident hotspots.

Data

The source data we will use has two years of weekly aggregated data into an H3 grid, counting the number of collisions per cell. The data is available at cartobq.docs.spacetime_collisions_weekly_h3 and it can be explored in the map below.

Spacetime Getis-Ord

We start by performing a spacetime hotspot analysis to identify hot and cold spots over time and space. We can use the following call to the Analytics Toolbox to run the procedure:

CALL `carto-un`.carto.GETIS_ORD_SPACETIME_H3_TABLE(
 'cartobq.docs.spacetime_collisions_weekly_h3',
 'cartobq.docs.spacetime_collisions_weekly_h3_gi',
 'h3',
 'week',
 'n_collisions',
 3,
 'WEEK',
 1,
 'gaussian',
 'gaussian'
);

CALL `carto-un-eu`.carto.GETIS_ORD_SPACETIME_H3_TABLE(
 'cartobq.docs.spacetime_collisions_weekly_h3',
 'cartobq.docs.spacetime_collisions_weekly_h3_gi',
 'h3',
 'week',
 'n_collisions',
 3,
 'WEEK',
 1,
 'gaussian',
 'gaussian'
);

CALL carto.GETIS_ORD_SPACETIME_H3_TABLE(
 'cartobq.docs.spacetime_collisions_weekly_h3',
 'cartobq.docs.spacetime_collisions_weekly_h3_gi',
 'h3',
 'week',
 'n_collisions',
 3,
 'WEEK',
 1,
 'gaussian',
 'gaussian'
);

For further detail on the spacetime Getis-Ord check out the documentation and this tutorial.

By performing this analysis, we can check how different parts of the city become “hotter” or “colder” as time progresses.

Understanding hot and cold spots

Once we have identified hot and cold spots, we can classify them into a set of predefined categories so that the results are easier to digest. For more information about the categories considered and the specific criteria, please check the SQL reference in the documentation.

We can run the analysis by calling the SPACETIME_HOTSPOTS_CLASSIFICATION procedure using the previously obtained Getis-Ord results.

CALL `carto-un`.carto.SPACETIME_HOTSPOTS_CLASSIFICATION(
 'cartobq.docs.spacetime_collisions_weekly_h3_gi',
 'cartobq.docs.spacetime_collisions_hotspot_classification',
 'index',
 'date',
 'gi',
 'p_value',
 '{"threshold": 0.05, "algorithm": "mmk"}'
);

CALL `carto-un-eu`.carto.SPACETIME_HOTSPOTS_CLASSIFICATION(
 'cartobq.docs.spacetime_collisions_weekly_h3_gi',
 'cartobq.docs.spacetime_collisions_hotspot_classification',
 'index',
 'date',
 'gi',
 'p_value',
 '{"threshold": 0.05, "algorithm": "mmk"}'
);

CALL carto.SPACETIME_HOTSPOTS_CLASSIFICATION(
 'cartobq.docs.spacetime_collisions_weekly_h3_gi',
 'cartobq.docs.spacetime_collisions_hotspot_classification',
 'index',
 'date',
 'gi',
 'p_value',
 '{"threshold": 0.05, "algorithm": "mmk"}'
);

We can see how now we have different types of behaviors at a glance in a single map. There are several insights we can extract from this map:

There is an amplifying hotspot in the city center that shows an upward trend in collisions.
The surroundings of that amplifying hotspot are mostly occasional.
The periphery of the city is mostly cold spots, but most of them are fluctuating or even declining.

Detecting space-time anomalous regions to improve real estate portfolio management

A quick start version of this guide is available here.

From disease surveillance systems, to detect spikes in network usage, or environmental monitoring systems, many applications require the monitoring of time series data in order to detect anomalous data points. In these event detection scenarios, the goal is to either uncover anomalous patterns in historical space-time data or swiftly and accurately detect emerging patterns, thereby enabling a timely and effective response to the detected events.

As a concrete example, in this guide we will focus on the task of detecting spikes in violent crimes in the city of Chicago in order to improve portfolio management of real estate insurers.

This guide shows how to use CARTO space-time anomaly detection functionality in the Analytics Toolbox for BigQuery. Specifically, we will cover:

A brief introduction to the method and to the formulations of the definition of anomalous, unexpected, or otherwise interesting regions
How to identify anomalous space-time regions using the DETECT_SPACETIME_ANOMALIES function

By the end of this guide, you will have detected anomalous space-time regions in time series data of violent crimes in the city of Chicago using different formulations of the anomaly detection problem.

Method

A variety of methods have been developed to monitor time series data and to detect any observations outside a critical range. These include outlier detection methods and approaches that compare each observed data point to its baseline value, which might represent the underlying population at risk or an estimate of the expected value. The latter can be derived from a moving window average or a counterfactual forecast obtained from time series analysis of the historical data, as can be for example obtained by fitting an Arima model to the historical data using the ARIMA_PLUS or the ARIMAS_PLUS_XREG model classes in Google BigQuery.

To detect anomalies that affect multiple time series simultaneously, we can either combine the outputs of multiple univariate time series or treat the multiple time series as a single multivariate quantity to be monitored. However, for time series that are also localized in space, we expect that if a given location is affected by an anomalous event, then nearby locations are more likely to be affected than locations that are spatially distant.

A typical approach to the monitoring of spatial time series data uses fixed partitions, which requires defining an a priori spatial neighborhood and temporal window to search for anomalous data. However, in general, we do not have a priori knowledge of how many locations will be affected by an event, and we wish to maintain high detection power whether the event affects a single location (and time), all locations (and times), or anything in between. A coarse partitioning of the search space will lose power to detect events that affect a small number of locations (and times), since the anomalous time series will be aggregated with other non-anomalous data. A fine partitioning of the search space will lose power to detect events that affect many locations (and times), since only a small number of anomalous time series are considered in each partition. Partitions of intermediate size will lose some power to detect both very small and very large events.

A solution to this problem is a multi-resolution approach in which we search over a large and overlapping set of space-time regions, each containing some subset of the data, and find the most significant clusters of anomalous data. This approach, which is known as thegeneralized space-time scan statistics framework, consists of the following steps:

Choose a set of spatial regions to search over, where each space-time region $S$ consists of a set of space-time locations $(i,t)$ (e.g. defined using spatial indexes).
Choose models of the data under $H_0$ (the null hypothesis of no cluster of anomalies) and $H_1(S)$ (the alternative hypothesis assuming an anomalous cluster in region $S$ ). Here we assume that that each location's value is drawn independently from some distribution $Dist(b_{i,t}, q_{i,t})$ where $b_{i,t}$ represents the set of baseline values of that location, and $q_{i,t}$ represents some underlying relative risk parameter. Second, we make the assumption that the relative risk $q_{i,t}$ is uniform under the null hypothesis: thus we assume that any space-time variation in the values under the null is accounted for by our baseline parameters and our methods are designed to detect any additional variation not reflected in these baselines.
Choose a baseline.
Derive a score function $F(S)$ based on the likelihood test ratio statistic $F(S)=\frac{Pr(Data|H1(S))}{Pr(Data|H0)}$ .
Find the most interesting regions, i.e. those regions S with the highest values of $F(S)$ .
Calculate the statistical significance of each discovered region using Monte Carlo randomization: generate random permutations of the data where each replica is a copy of the original search area where each value is randomly drawn from the null distribution; for each permutation, select the space-time zone associated with the maximum score and fit a Gumbel distribution to the maximum scores to derive an empirical p-value.

Difference between space-time anomaly detection and anomaly detection

While anomaly detection typically focuses on single data points and asks whether each point is anomalous, space-time anomaly detection focuses on finding space-time groups or patterns which are anomalous, even if each individual point in the group might not be surprising on its own.

Difference between space-time anomaly detection and clustering

Overall, clustering and space-time anomaly detection have very different goals (partitioning data into groups versus finding statistically anomalous regions). Nevertheless, some clustering methods, commonly referred to as density-based clustering (e.g. DBSCAN), partition the data based on the density of points and as a result we might think that these partitions may correspond to the anomalous regions that we are interested in detecting. However density-based clustering is not adequate for the space-time anomaly detection task: first we also want to draw substantial conclusions about the regions we find (whether each region represents a significant cluster or is likely to have occurred by chance); and secondly, we want to be able to deal adequately with spatially (and temporally) varying baselines, while density-based clustering methods are specific to the notion of density as number of points per unit area.

Difference between space-time anomaly detection and hotspot analysis

Based on methods like the Getis-Ord Gi* statistics and hotspot analysis can be used to identify regions with high or low event intensity. It works by comparing proportionally the local sum of an attribute to the global sum, resulting in a z-score for each observation: observations with a regional sum significantly higher or lower than the global sum are considered to have statistically significant regional similarity above or below the global trend. However, unlike space-time anomaly detection, it uses a fixed spatial and/or temporal window, and is more exploratory and not suitable for inferential analysis.

Data

Crime data is often an overlooked component in property risk assessments and rarely integrated into underwriting guidelines, despite the FBI's latest estimates indicating over $16 billion in losses annually from property crimes only. In this example, we will use the locations of violent crimes in Chicago available in BigQuery public marketplace, extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. The data are available daily from 2001 to present, minus the most recent seven days, which also allows to showcase how to use this method to detect space-time anomalies in almost-real-time.

For the purpose of this guide, the data were first aggregated weekly (by assigning each daily data to the previous Monday) and by H3 cell at resolution 7, as shown in this map, where we can visualise the total counts for the whole period by H3 cell and the time series of the H3 cells with most counts

Each H3 cell has been further enriched using demographic data from the American Community Survey (ACS) at the census block resolution. Finally, each time series has been gap filled to remove any gap by assigning a zero value to the crime counts variable. The final data can be accessed using this query

SELECT date, h3,  counts, total_pop_sum AS counts_baseline
FROM `cartobq.docs.chicago_crime_2024-07-30_enriched`
WHERE date > '2001-01-01'

Detecting anomalous spikes in violent crimes in Chicago

Using the population at risk as baseline (population-based)

We start by detecting the space-time anomalies in counts of violent crimes with respect to the population at risk, given by the H3 total population enriched with data from the 5-year American Community Survey (ACS) at the census block resolution. In this approach to define baseline values, named population-based ('estimation_method':'POPULATION'), we expect the crime counts to be proportional to the baseline values, which typically represent the population corresponding to each space-time location and can be either given (e.g. from census data) or inferred (e.g. from sales data), and can be adjusted for any known covariates (such as age of population, risk factors, seasonality, weather effects, etc.). Specifically, we wish to detect space-time regions where the observed rates are significantly higher inside than outside.

Assuming that the counts are Poisson distributed (which is the typical assumption for count data, 'distributional_model':'POISSON'), we can obtain the space-time anomalies using the following query

CALL `carto-un`.carto.DETECT_SPACETIME_ANOMALIES(
-- input_query
'''
   SELECT date, h3,  counts, total_pop_sum AS counts_baseline
   FROM `cartobq.docs.chicago_crime_2024-07-30_enriched`
   WHERE date > '2001-01-01'
''',
-- index_column
'h3',
-- date_column
'date',
-- input_variable_column
'counts',
-- time_freq
'Week',
-- output_table
'<my-project>.<my-dataset>.<my-output_table>',
-- options
'''{
   'kring_size':[1,3],
   'time_bw':[2,6],
   'is_prospective': false,
   'distributional_model':'POISSON',
   'permutations':99,
   'estimation_method':'POPULATION'
}'''
)

CALL `carto-un-eu`.carto.DETECT_SPACETIME_ANOMALIES(
-- input_query
'''
   SELECT date, h3,  counts, total_pop_sum AS counts_baseline
   FROM `cartobq.docs.chicago_crime_2024-07-30_enriched`
   WHERE date > '2001-01-01'
''',
-- index_column
'h3',
-- date_column
'date',
-- input_variable_column
'counts',
-- time_freq
'Week',
-- output_table
'<my-project>.<my-dataset>.<my-output_table>',
-- options
'''{
   'kring_size':[1,3],
   'time_bw':[2,6],
   'is_prospective': false,
   'distributional_model':'POISSON',
   'permutations':99,
   'estimation_method':'POPULATION'
}'''
)

As we can see from the query above, in this case we are looking retrospectively for past anomalous space-time regions ('is_prospective: false', i.e. a temporal zone can end at any timestamp) with spatial extent with a k-ring ('kring_size') between 1 (first order neighbours) and 3 (third order neighbors) and a temporal extent ('time_bw') between 2 and 6 weeks. Finally, the 'permutations' parameter is set to define the number of permutations used to compute the statistical significance of the detected anomalies. As noted above, empirical results suggest that the null distribution of the scan statistic is fit well by a Gumbel extreme value distribution and can be used to obtain empirical p-values for the spatial scan statistic with great accuracy in the far tail of the distribution: for a smaller number of replications under the null we can calculate very small p-values (for example, p-values on the order of 0.00001 can be accurately calculated with only 999 random replicates by using the Gumbel approximation, while it would require more than 999,999 replicates to get the same power and precision from Monte Carlo hypothesis testing). The results of this experiment are show in this map

As we can see from this map, the space-time zone with the largest score (whose extent is shown in the right panel) has a higher relative risk than the rest of the data.

Using the expected counts as baseline (expectation-based)

Another way of interpreting the baselines, is to assume that the observed values should be equal (and not just proportional as in the population-based approach) to the baseline under the null hypothesis of no anomalous space-time regions. This approach, named expectation-based, requires an estimate of the baseline values which are inferred from the historical time series, potentially adjusting for any relevant external effects such as day-of-week and seasonality.

Computing the expected counts with a moving average

A simple way of estimating the expected crime counts is to compute a moving average of the weekly counts for each H3 cell. For example, we could average each weekly value over the span between the previous and next three weeks

-- input_query
SELECT date, h3, 
counts, 
AVG(counts) OVER(PARTITION BY h3 ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as counts_baseline
FROM `cartobq.docs.chicago_crime_2024-07-30_enriched`
WHERE date > '2001-01-01'

CALL `carto-un`.carto.DETECT_SPACETIME_ANOMALIES(
-- input_query
''' <my_input-query>''',
-- index_column
'h3',
-- date_column
'date',
-- input_variable_column
'counts',
-- time_freq
'Week',
-- output_table
'<my-project>.<my-dataset>.<my-output_table>',
-- options
'''{
    'kring_size':[1,3],
    'time_bw':[4,16],
    'is_prospective': false,
    'distributional_model':'POISSON',
    'permutations':99,
    'estimation_method':'EXPECTATION'
}'''
)

CALL `carto-un-eu`.carto.DETECT_SPACETIME_ANOMALIES(
-- input_query
''' <my_input-query>''',
-- index_column
'h3',
-- date_column
'date',
-- input_variable_column
'counts',
-- time_freq
'Week',
-- output_table
'<my-project>.<my-dataset>.<my-output_table>',
-- options
'''{
    'kring_size':[1,3],
    'time_bw':[4,16],
    'is_prospective': false,
    'distributional_model':'POISSON',
    'permutations':99,
    'estimation_method':'EXPECTATION'
}'''
)

The map below shows the spatial and temporal extent of the ten most anomalous regions (being the region with rank 1, the most anomalous), together with the time series of the sum of the counts and baselines (i.e. the moving average values) for the time span of the selected region

Computing the expected counts from a time series model

To improve the estimate of baseline values, we could also infer these values using a time series model of the past observations that can allow for seasonal and holiday effects. This can be achieved by fitting any standard time series analysis methods, such as a ARIMA model to the time series of each H3 cell

CREATE MODEL '<my-project>.<my-dataset>.<my-arima_plus_model>',
OPTIONS(model_type='ARIMA_PLUS',
AUTO_ARIMA=TRUE,
time_series_id_col = 'h3',
time_series_data_col='counts',
time_series_timestamp_col='date') 
AS (
    training_data AS (
        SELECT date, h3, counts
        FROM `cartobq.docs.chicago_crime_2024-07-30_enriched`
    ),
    custom_holiday AS (
        SELECT *
        FROM `cartobq.docs.chicago_crime_2024-07-30_holidays`
    )
)

The baseline values can be then computed by subtracting the residuals to the observed counts, by calling the ML.EXPLAIN_FORECAST function

-- input_query
SELECT a.date, a.h3, a.counts, (a.counts - b.residual) AS counts_baseline      
FROM `cartobq.docs.chicago_crime_2024-07-30_enriched` a
JOIN ML.EXPLAIN_FORECAST(MODEL
`<my-project>.<my-dataset>.<my-arima_plus_model>`) b
ON a.date = CAST(b.time_series_timestamp AS DATE) AND a.h3 = b.h3
WHERE date > '2001-01-01'

And using the same procedure call as before, we can get the most 10 anomalous regions for the newly computed baselines

Whether to use a simple moving average or a time-series model to infer the baselines, depends on the question that we are trying to answer (e.g. if the expected values should be adjusted for day of the week, seasonal, and holiday effects) as well as on the type and quality of data (how long the time series is, how noisy, etc.). To further investigate the differences between a moving average or an ARIMA-based model, we can plot the difference between the observed values and the baseline values for each method, as shown here for the ten H3 cells with the most number of crimes

Adjusting the expected counts to include external effects

For many cases, we also want to adjust the baseline values for any known covariate such as weather effects, mobility trends, age of population, income, etc. For example, here, we might include the effects from the census variables derived from ACS 5-years averages like the median age, the median rent, the black and hispanic population ratios, the owner and vacant occupied housing units ratio, and the ratio of families with young children. To include these additional effects, we can run for each H3 cell an ARIMA model with external covariates and get the covariate-adjusted predictions

-- Create model
CREATE MODEL '<my-project>.<my-dataset>.<my-arima_plus_model>',
OPTIONS(model_type='ARIMA_PLUS_XREG',
        AUTO_ARIMA=TRUE,
        time_series_data_col='counts',
        time_series_timestamp_col='date') 
        AS (
                training_data AS (
                        SELECT * EXCEPT(h3)
                        FROM `cartobq.docs.chicago_crime_2024-07-30_enriched`
                        WHERE h3 = '87275934effffff'
                ),
                custom_holiday AS (
                        SELECT *
                        FROM `cartobq.docs.chicago_crime_2024-07-30_holidays`
                )
        );

-- Get forecast        
SELECT a.date, '87275934effffff' AS h3, 
        (a.counts - b.residual) AS baseline_arima_plus_xreg,
FROM `cartobq.docs.chicago_crime_2024-07-30_enriched` a
JOIN ML.EXPLAIN_FORECAST(
        MODEL `<my-project>.<my-dataset>.<my-arima_plus_model>`, 
        STRUCT(), 
        TABLE data) b
ON a.date = CAST(b.time_series_timestamp AS DATE)

For easy understanding, we have already joined the results for each H3 cell into a table

--input_query
SELECT date, h3,  counts, baseline_arima_plus_xreg AS counts_baseline
FROM `cartobq.docs.chicago_crime_2024-07-30_counts_w_baselines_xreg`
WHERE date > "2001-01-01"

Given these covariate-adjusted baselines, we can use the procedure to detect space-time anomalies with the same options as before and get the most 10 anomalous regions for the newly computed baselines

Retrospective VS prospective analysis

The examples given so far showed how to detect anomalies retrospectively ('is_prospective: false') , which means that the whole time series is available and the space-time anomalies can happen at any point in time over all the past data (a temporal zone can end at any timestamp). However, the procedure can also be applied when the interest relies on detecting emerging anomalies ('is_prospective: true') for which the search focuses only on the final part of the time series (a temporal zone can only have as its end point the last timestamp). The prospective case is useful especially with real-time data, as in this case the goal is detecting anomalies as quickly as possible. On the other hand, a retrospective analysis is more useful to understand past-events, improve operational processes, validate models, etc.

Population-based VS expectation-based baselines

Whether to use an expectation-based approach or a population-based approach depends both on the type and quality of data, as well as the types of anomalies we are interested in detecting.

Absolute VS relative baselines. If we only have relative (rather than absolute) information about what we expect to see, a population-based approach should be used.
Detection power. The expectation-based approach should be used when we can accurately estimate the expected values in each space-time location, either based on a sufficient amount of historical data, or based on sufficient data from a null or control condition; in these cases, expectation-based statistics will have higher detection power than population-based statistics.
Local VS global changes. If the observed values throughout the entire search region are much higher (or lower) than expected, the expectation-based approach will find these changes very significant but if these do not vary spatially and/or temporally the population-based method will not find any significant anomalous space-time regions. If we assume that such changes have resulted from large space-time regions (and are therefore relevant to detect), the expectation-based approach should be used. On the other hand, if we assume that these changes have resulted from unmodelled and irrelevant global trends (and should therefore be ignored), then it is more appropriate to use the population-based approach.

When the data does not have a temporal component, a similar approach can be applied to detect spatial anomalies using the DETECT_SPATIAL_ANOMALIES procedure. In this case we are also interested in detecting regions that are anomalous with respect to some baseline, that, as for the space-time case, can be computed with the population- or expectation-based approaches. For the latter, typically a regression model (e.g. a linear model) is required, which is used to estimate the expected values and their variances conditional on some covariates.

Analyzing Airbnb ratings in Los Angeles

Context

Founded in 2008, Airbnb has quickly gained global popularity among travelers. To elevate this service, identifying the determinants of listing success and their role in drawing tourism is pivotal. The users' property ratings focus on criteria such as accuracy, communication, cleanliness, location, check-in, and value.

This tutorial aim to extract insights into Airbnb users' overall impressions, connecting the overall rating score with distinct variables while taking into account the geographical neighbors behavior through a Geographically Weighted Regression model.

We'll also dive into the regions where location ratings significantly influence the overall score and enrich this analysis with sociodemographic data from CARTO's Data Observatory.

This tutorial will take you through the following sections:

Step-by-Step Guide:

Visualizing Airbnb listings

Access the Maps section from your CARTO Workspace using the navigation menu and create a New Map.

Add Los Angeles Airbnb data from CARTO Data Warehouse.
- Select the Add source from button at the bottom left on the page.
- Click on the CARTO Data Warehouse connection.
- Navigate through demo data > demo tables to losangeles_airbnb_data and select Add source.
Let's add some basic styling! Rename the map to Map 1 Airbnb initial data exploration. Then click on Layer 1 in the Layers panel and apply the following:
- Name (select the three dots next to the layer name): Airbnb listings
- Color: your pick!
- Outline: white, 1px stroke
- Radius: 3
Switch from Layers to Interactions at the top left of the UI. Enable interactions for the layer.
- Select a style for the pop-up window; we'll use light.
- From the drop-down menu, select the variable price_num.
- Select # to format the numbers as dollars. In the box to the right, rename the field Price per night.

You should have something that looks a little like this

We will now inspect how Airbnb listings are distributed across Los Angeles and aggregate the raw data to have a better understanding on how different variables vary geographically within the city.

Now let's add a new data source to visualize the airbnb listings using an H3 grid.

Aggregating data to a H3 grid

Now let's aggregate this data to a H3 grid. This approach has multiple advantages:

Ease of interpreting spatial trends on your map
Ability to easily enrich that grid with multiple data sources
Suitability for spatial modelling like Geographically Weighted Regression...

...all of which we'll be covering in this tutorial!

In the CARTO Workspace, head to Workflows and select + New Workflow, using the CARTO Data Warehouse connection.
At the top left of the new workflow, rename the workflow "Airbnb analysis."
In the Sources panel (left of the window), navigate to Connection Data > demo data > demo_tables and drag losangeles_airbnb_data onto the canvas.
Switch from Sources to Components, and locate H3 from GeoPoint. Drag this onto the canvas to the right of losangeles_airbnb_data and connect the two together. Set the H3 resolution to 8. This will create a H3 grid cell for every Airbnb location.
Back in Components, locate Group by. Drag this to the right of H3 from GeoPoint, connecting the two. We'll use this to create a frequency grid and aggregate the input numeric variables:
1. Set the Group by field to H3.
2. For the aggregation columns, set review_scores_cleanliness, review_scores_location, review_scores_value, review_scores_rating and price_num to AVG. Add a final aggregation column which is H3 - COUNT (see below).

Connect this Group by component to a Rename column component, renaming h3_count to airbnb_count.
Finally, connect the Rename column count to a Save as Table component, saving this to CARTO Data Warehouse > Organization > Private and calling it airbnb_h3r8. If you haven't already, run your workflow!

Prefer to use SQL?

You can replicate this in the CARTO Builder SQL console with the following code:

Now, head back to the CARTO Builder map that we created earlier. Add the H3 aggregation table that you just created to the map (Sources > Add source from > Data Explorer > CARTO Data Warehouse > Organization > Private).

Let's style the new layer:

Name: H3 Airbnb aggregation
Order in display: 2
Fill color: 6 steps blue-yellow ramp based on column price_num_avg using Quantile color scale.
No stroke

Do you notice how it's difficult to see the grid beneath the Airbnb point layer? Let's enable zoom-based visibility to fix that, so we only see the points as we zoom in further. Go into the layer options for each layer, and set the Visibility by zoom layer to 11-21 for Airbnb listings.

You might also find the basemap more difficult to read now we have a grid layer covering it. Head to the basemaps panel (to the right of Layers) and switch to Google Maps > Positron. You'll now notice some of the labels sit on top of your grid data.

Now, let's try looking at this in 3D! At the center-top of the whole screen, switch to 3D view - then in H3 Airbnb aggregation:

Toggle the Height button and style this parameter using:
- Column: airbnb_count (SUM)
- Height scale: sqrt
- Value: 50

Inspect the map results carefully. Notice where most listings are located and where the areas with highest prices are. Optionally, play with different variables and color ramps.

Now let's start to dig a little deeper into our data!

Enriching the grid with demographic data

So far we have seen how the Airbnb listings locations and its main variables are distributed across the city of Los Angeles. Next, we will enrich our visualization by adding CARTO Spatial Features H3 at resolution 8 dataset from .

This dataset holds information that can be useful to explore the influence of different factors, including variables such as the total population, the urbanity level or the presence of certain type of points of interests in different areas.

In the CARTO Workspace, click on ‘Data Observatory’ to browse the and apply these filters:

Countries: United States of America
Licenses: Public data
Sources: CARTO

Select the Spatial Features - United States of America (H3 Resolution 8) dataset and click on Subscribe for free. This action will redirect us to the subscription level at the Data Explorer menu.

Head back into the workflow you created earlier.
Navigate to Sources > Data Observatory > CARTO and find the table you just subscribed to and drag it onto the canvas, just below the final Save as Table component. Can't find it? Try refreshing your page.
Using a Join component, connect the output of Save as Table to the top input, and of Spatial Features to the bottom. Set the join columns from each table to H3, and the join type to left - meaning that all features from the first input (Save as Table) will be retained. Run!
We now have a huge amount of contextual data to help our analysis - in fact, far more than we want! Connect the output of the join to an Edit schema component, selecting only the columns from your original Airbnb grid, plus population and urbanity.

From here, you can save this as a table and explore it on a map - or move on to the final stage of this tutorial.

Estimating the influence of variables on the score

Next we will apply a Geospatially Weighted Regression (GWR) model using the function to our Airbnb H3 aggregated data. We’ve already seen where different variables rate higher on our previous map.

This model will allow us to extract insights of what the overall impression of Airbnb users depends on, by relating the overall rating score with different variables (specifically we will use: value, cleanliness and location)

We will also visualize where the location score variable significantly influences the ‘Overall rating’ result.

We will now proceed to calculate the GWR model leveraging CARTO Analytics Toolbox for BigQuery. You can do so using CARTO Workflows or your data warehouse console.

In your workflow, connect a GWR component to the Edit schema component from earlier. The parameters used in GWR model will be as follows:

Index column: h3
Feature Variables:
- review_scores_value_avg,
- review_scores_cleanliness_avg
- review_scores_location_avg
Target variable:
- review_scores_rating_avg
Kring Size: 3
Kernel function: gaussian
Fit intercept: True

Finally, let's add another join to rejoin Edit Schema to the results of the GWR analysis so we have all of the contextual information in one table ready to start building our map.

Run!

Prefer to use SQL?

You can replicate this in your data warehouse SQL console with the following code:

Feel free to use another Save as Table component to materialise it, otherwise it will be stored as a temporary table and deleted after 30 days.

In the CARTO Workspace under the Map tab, click on the three dots next to your original map and duplicate it, calling it Map 2 GWR Model map.
Add your GWR layer in the same way you had added previous layers, and turn off the layer H3 Airbnb aggregation.
Style the new layer (you may find it easier to turn the other layers off as you do this - you can just toggle the eye to the right of their names in the layer panel to do this):
1. Name: Location relevance (Model)
2. Layer order: 3 (the bottom)
3. Fill Color: 5 step diverging Colorbrewer blue-red ramp based on review_scores_location_avg_coef_estimate. Here, negative values depict a negative relationship between the location score and overall score, and positive values depict a positive relationship (i.e. location plays an important role in the overall ranking). A good way of visualizing this is to begin with a Quantile color scale, and then switch to Custom and play around with the color bands until they reflect the same values moving away from a neutral band around zero (see below, where we have bands which diverage from -0.05 to 0.05).
4. No stroke
In the Legend panel (to the right of Layers), change the Color based on text to Location - Overall rating coefficient so it's easier for the user to understand.

In the Basemaps panel (to the right of Layers) change the basemap to Google Maps Roadmap basemap.

Click on the Dual map view button at the top of the screen (next to 3D mode) to toggle the split map option.

Left map: disable the Location relevance (Model)
Right map: disable the H3 AirBnB aggregation

Inspect the model results in detail to understand where the location matters the most for users' overall rating score and how the location rating values are distributed.

Try styling the map layers depending on other variables to have a better understanding on how different variables influence model results.

Now let's start adding some more elements to our map to help our users better navigate our analysis.

Head to the Widgets panel, to the left of the Layers panel. Add the following widgets to the map:

Total listings
- Layer: Airbnb listings
- Type: Formula
- Operation: COUNT
- Formatting: Integers with thousand separators
- Note: Total nº of Airbnb listings in the map extent.

Population near Airbnbs
- Layer: H3 Airbnb aggregation
- Type: Formula
- Operation: SUM
- Formatting: Decimal summarized (12.3K)
- Aggregation column: population
- Notes: Population in cells with Airbnbs

Urbanity
- Layer: H3 Airbnb aggregation
- Type: Pie
- Operation: COUNT
- Column: urbanity_joined_joined (MODE)

In the Interactions tab (to the right of Widgets), add an interaction to H3 Airbnb aggregation so users can review attributes while navigating the map. Switch from Click to Hover and choose the style Light. Select the attributes population_joined_joined (sum), urbanity_joined_joined (mode) and airbnb_count_joined. Click on the variable options (#) to choose a more appropriate format and more readable field names. Your map should now be looking a bit like the below:

Navigate the map and observe how widget values vary depending on the viewport area. Check out specific areas by hovering over them and review pop-up attributes.

Now let's add a rich description of our map so users can have more context - we'll be using . At the top right of the screen, select the "i" icon to bring up the Map Description tab (you can switch between this and widgets). You can copy and paste the below example or create your own.

If you click on the "eye" icon, you can preview what this looks like...

Finally we can make the map public and share the link to anybody in the organization. For that you should go to “Share” on the top right corner and set the map as Public. For more details, see .

Now we are ready to share the results! 👇

Create a dashboard with user-defined analysis using SQL Parameters

Context

In this tutorial, we'll explore the power of Builder in creating web map applications that adapt to user-defined inputs. Our focus will be on demonstrating how SQL Parameters can be used to dynamically update analyses based on user input. You'll learn to implement these parameters effectively, allowing for real-time adjustments in your geospatial analysis.

Although our case study revolves around assessing the risk on Bristol's cycle network, the techniques and methodologies you'll learn are broadly applicable. This tutorial will equip you with the skills to apply similar dynamic analysis strategies across various scenarios, be it urban planning, environmental studies, or any field requiring user input for analytical updates.

Step-by-Step Guide:

Access the Maps section from your CARTO Workspace using the Navigation menu.

Click on "New map". A new Builder map will open in a new tab.

In this tutorial, we will undertake a detailed analysis of accident risks on Bristol's cycle network. Our objective is to identify and assess the safest and riskiest segments of the network.

So first, let's add bristol_cycle_network data source following below steps:
- Click on "Add sources from..." and select "Data Explorer"
- Navigate to CARTO Data Warehouse > demo_data > demo_tables
- Select bristol_cycle_network table and click "Add source"

A new layer appears once the source is added to the map. Rename the layer to "Cycle Network" and change the title of the map to "Analyzing risk on Bristol cycle routes".

Then, we will add bristol_traffic_accidents data source following below steps:
- Click on "Add sources from..." and select "Data Explorer"
- Navigate to CARTO Data Warehouse > demo_data > demo_tables
- Select bristol_traffic_accidents table and click "Add source"

A new layer is added. Rename it to 'Traffic Accidents'.

Using Traffic Accidents source, we are going to generate an influence area using ST_BUFFER() function whose radius will be updated by users depending on the scenario they are looking to analyse. To do so, we will add again the Traffic Accidents data source, but this time, we will add it as a SQL Query following these steps:
- Click on "Add sources from..." and select "Custom Query (SQL)"
- Click on the CARTO Data Warehouse connection.
- Select Type your own query.
- Click on the "Add Source button".

The SQL Editor panel will be opened.

Enter the following query, with the buffer radius distance set to 50 and click on "Run".

SELECT * EXCEPT(geom), ST_BUFFER(geom,50) as geom FROM carto-demo-data.demo_tables.bristol_traffic_accidents

Rename the layer to 'Traffic Influence Area', move it just below Traffic Accidents existing layer. Access the Layer panel and within Fill Color section, reduce its opacity to 0.3 and set the color to red. Just below, disable the Stroke Color using the toggle button.

Now, we'll transform bristol_cycle_network source table to a query. To do so, you can click on the three dots located in the source card and click on "Query this table".

Click "Continue" on the warning modal highlighting that the styling of this layer will be lost.

The SQL Editor panel is displayed with a SELECT * statement. Click on "Run" to execute the query.

Repeat Step 10, Step 11 and Step 12 to generate a query, this time from bristol_traffic_accidents source table.

To easily distinguish each data source, you can rename them using the 'Rename' function. Simply click on the three dots located on the data source card and select 'Rename' to update their names accordingly to match the layer name.

The Traffic Accidents source contains attributes which spans from 2017-01-03 to 2021-12-31. To allow users interact and obtain insights for the desired time period, we will add to the dashboard:

A Time Series Widget
A SQL Date Parameter

First, we'll incorporate a Time Series Widget into our map. To do this, head over to the 'Widgets' tab and click on 'Add new widget'. In the Data section, use the 'Split by' functionality to add multiple series by selecting the severity_description column. Also, make sure to rename the widget appropriately to "Accidents by Severity". Once you've configured it, the Time Series Widget will appear at the bottom of the interface, displaying essential information relevant to each severity category.

Now, let's add a SQL Date Parameter that will allow users to select their desired time period by accessing to a calendar interface. To do so, access "Create a SQL Parameter" functionality located at the top right corner of the data sources panel.

Then, select SQL Date Parameter type in the modal and set the configuration as per below. details Once the configuration is filled, click on "Create parameter".
- Start date: 2017-01-03
- End date: 2021-12-31
- Display name: Event Date
- Start date SQL name: {{event_date_from}}
- End date SQL name: {{event_date_to}}

A parameter control placeholder will appear in the right panel in Builder. Now let's add the parameter in our Traffic Accident SQL Query using the start and end date SQL name as per below. Once executed, a calendar UI will appear where users can select the desired time period.

SELECT * FROM `carto-demo-data.demo_tables.bristol_traffic_accidents`
WHERE date_ >= {{event_date_from}} AND date_ <= {{event_date_to}}

As you might know, SQL Parameters can be used with multiple sources at the same time. This is perfect for our approach as we are looking to filter and dynamically update an analysis that affect to different sources.

For instance, we will now add the same WHERE statement to filter also the Accident Influence Area source to make sure that both sources and layers are on sync. To do so, open the SQL Query of Accident Influence Area source and update it as per below query:

SELECT * EXCEPT(geom), ST_BUFFER(geom,50) as geom FROM carto-demo-data.demo_tables.bristol_traffic_accidents
WHERE date_ >= {{event_date_from}} AND date_ <= {{event_date_to}}

Then click run to execute it.

Now when using Event Date parameter, both sources, Traffic Accidents and Accident Influence Area are filtered to the specified time period.

Now, we are going to add a new SQL Parameter that will allow users to define their desired radius to calculate the Accident Influence Area. This parameter will be added as a placeholder to our ST_BUFFER() function already added to our Accident Influence Area SQL query. First, create a SQL Numeric Parameter and configure it as per below:
- Slider Type: Simple
- Min Value: 0
- Default Value: 30
- Max Value: 100
- Scale type: Discrete
- Step increment: 10
- Parameter Name: Accident Influence Radius
- Parameter SQL Name: {{accident_influence_radius}}

Once the parameter is added as a control placeholder, you can use the SQL name in your Accident Influence Area SQL Query. You just need to replace the 50 value in the ST_BUFFER() function by {{accident_influence_radius}}.

The output query should look as per below:

SELECT * EXCEPT(geom), ST_BUFFER(geom,{{accident_influence_radius}}) as geom FROM carto-demo-data.demo_tables.bristol_traffic_accidents
WHERE date_ >= {{event_date_from}} AND date_ <= {{event_date_to}}

Now, users can leverage Accident Influence Radius parameter control to dynamically update the accident influence area.

Now we can update Cycle Network source to count the number of accident regions that intersect with each segment to understand its risk. As you can see, the query takes into account the SQL parameters to calculate the risk according to the user-defined parameters.

-- Extract the accident influence area
WITH accident_area AS (
  SELECT 
    ST_BUFFER(geom, {{accident_influence_radius}}) as buffered_geom,
    *
  FROM 
    `carto-demo-data.demo_tables.bristol_traffic_accidents`
  WHERE date_ >= {{event_date_from}} AND date_ <= {{event_date_to}}
),
-- Count the accident areas that intersect with a cycle network
network_with_risk AS (
  SELECT 
    h.geoid,
    ANY_VALUE(h.geom) AS geom,
    COUNT(a.buffered_geom) AS accident_count
  FROM 
    `carto-demo-data.demo_tables.bristol_cycle_network` h
  LEFT JOIN 
    accident_area a 
  ON 
    ST_INTERSECTS(h.geom, a.buffered_geom)
  GROUP BY h.geoid
)
-- Join the risk network with those were no accidents occurred
SELECT 
  IFNULL(a.accident_count,0) as accident_count, b.* 
 FROM `carto-demo-data.demo_tables.bristol_cycle_network` b 
 LEFT JOIN network_with_risk a
 ON a.geoid = b.geoid

Access Cycle Network layer panel and in the Stroke Color section select accident_count as the 'Color based on' column. In the Palette, set the Step Number to 4, select 'Custom' as the palette type and assign the following colors:
- Color 1: #40B560
- Color 2: #FFB011
- Color 3: #DA5838
- Color 4: #83170C

Then, set the Data Classification Method to Quantize and set the Stroke Width to 2.

Now, the Cycle Network layer displays cycle network by accident count, so users can easily extract risk insights on it.

Now we will add some Widgets linked to Cycle Network source. First, we will add a Pie Widget that displays accidents by route type. Navigate to the Widgets tab, select Pie Widget and set the configuration as follows:
- Operation: SUM
- Source Category: Newroutety
- Aggregation Column: Accident_count

Once the configuration is set, the widget is displayed in the right panel.

Then, we'll add a Histogram widget to display the network accident risk. Go back and click on the icon to add a new widget and select Cycle Network source. Afterwards, select Histogram as the widget type. In the configuration, select Accident_count in the Data section and set the number of buckets in the Display options to 5.

Finally, we will add a Category widget displaying the number of accidents by route status. To do so, add a new Category widget and set the configuration as below:
- Operation: SUM
- Source category: R_status
- Aggregation column: Accident_count

After setting the widgets, we are going to add a new parameter to our dashboard that will allow users filter those networks and accidents by their desired route type(s). To do so, we'll click on 'Create a SQL Parameter' and select Text Parameter. Set the configuration as below, adding the values from Cycle Network source using newroutety column.

A parameter control placeholder will be added to the parameter panel. Now, let's update the SQL Query sources to include this WHERE statement WHERE newroutety IN {{route_type}} to filter both accidents and network by the route type. The final SQL queries for the three sources should look as below:

Cycle Network SQL Query:

-- Extract the accident influence area
WITH accident_area AS (
  SELECT 
    ST_BUFFER(geom, {{accident_influence_radius}}) as buffered_geom,
    *
  FROM 
    `carto-demo-data.demo_tables.bristol_traffic_accidents`
  WHERE date_ >= {{event_date_from}} AND date_ <= {{event_date_to}}
),
-- Count the accident areas that intersect with a cycle network
network_with_risk AS (
  SELECT 
    h.geoid,
    ANY_VALUE(h.geom) AS geom,
    COUNT(a.buffered_geom) AS accident_count
  FROM 
    `carto-demo-data.demo_tables.bristol_cycle_network` h
  LEFT JOIN 
    accident_area a 
  ON 
    ST_INTERSECTS(h.geom, a.buffered_geom)
  GROUP BY h.geoid
)
-- Join the risk network with those were no accidents occurred
SELECT 
  IFNULL(a.accident_count,0) as accident_count, b.* 
 FROM `carto-demo-data.demo_tables.bristol_cycle_network` b 
 LEFT JOIN network_with_risk a
 ON a.geoid = b.geoid
 WHERE newroutety IN {{route_type}}

Traffic Accidents SQL Query

WITH buffer AS (
   SELECT 
       ST_BUFFER(geom,{{accident_influence_radius}}) as buffer_geom, 
       * 
   FROM `carto-demo-data.demo_tables.bristol_traffic_accidents`
   WHERE date_ >= {{event_date_from}} AND date_ <= {{event_date_to}})

SELECT 
    a.* EXCEPT(buffer_geom)
FROM buffer a,
`carto-demo-data.demo_tables.bristol_cycle_network` h
WHERE  ST_INTERSECTS(h.geom, a.buffer_geom) 
AND newroutety IN {{route_type}}

Accident Influence Area SQL Query

WITH buffer AS (
SELECT ST_BUFFER(geom,{{accident_influence_radius}}) as geom, 
* EXCEPT(geom) 
FROM `carto-demo-data.demo_tables.bristol_traffic_accidents`
WHERE date_ >= {{event_date_from}} AND date_ <= {{event_date_to}})


  SELECT 
    a.*
  FROM buffer a,
  `carto-demo-data.demo_tables.bristol_cycle_network` h
  WHERE  ST_INTERSECTS(h.geom, a.geom) 
  AND newroutety IN {{route_type}}

Once you execute the updated SQL queries you will be able to filter the accidents and network by the route type.

Change the style of Traffic Accidents layer, setting the Fill Color to red and the Radius to 2. Disable the Stroke Color.

Interactions allow users to extract insights from specific features by clicking or hoovering over them. Navigate to the Interactions tab and enable Click interaction for Cycle Network layer, setting below attributes and providing a user-friendly name.

In the Legend tab, change the text label of the first step of Cycle Network layer to NO ACCIDENTS and rename the title to Accidents Count.

Add a map description to your dashboard to provide further context to the viewer users. To do so, access the map description functionality by clicking on the icon located at the top right corner of the header. You can add your own description or copy the below. Remember map description ad widget notes support markdown syntax.

### Cycle Routes Safety Analysis

![Image: Global Populated Places](https://app-gallery.cartocdn.com/builder/cyclist_accident.jpg)

This map is designed to promote safer cycling experiences in Bristol and assist in efficient transport planning.  

#### What You'll Discover: 

- **Historical Insight into Accidents**: Filter accidents by specific date ranges to identify temporal patterns, perhaps finding times where increased safety measures could be beneficial.

- **Adjustable Influence Area**: Adjust the accident influence radius to dynamically identify affected cycle routes based on different scenarios.

- **Cycle Route Analysis**: By analyzing specific route types, we can make data-driven decisions for optimization of cycle route network.

- **Temporal Accident Trends**: Utilize our time series widget to recognize patterns. Are some months riskier than others? These insights can inform seasonal safety campaigns or infrastructure adjustments.

We are ready to publish and share our map. To do so, click on the Share button located at the top right corner and set the permission to Public. In the 'Shared Map Settings', enable SQL Parameter. Copy the URL link to seamlessly share this interactive web map app with others.

Finally, we can visualize the results!

How to create a composite score with your spatial data

In this guide we show how to combine (spatial) variables into a meaningful composite indicator using CARTO Analytics Toolbox for BigQuery. Prefer a low-code approach? Check out the Workflows tutorial Spatial Scoring: Measuring merchant attractiveness and performance.

A composite indicator is an aggregation of variables which aims to measure complex and multidimensional concepts which are difficult to define, and cannot be measured directly. Examples include innovation, human development, environmental performance, and so on.

To derive a spatial score, two main functionalities are available:

Aggregation of individual variables, scaled and weighted accordingly, into a spatial composite score (CREATE_SPATIAL_COMPOSITE_UNSUPERVISED)
Computation of a spatial composite score as the residuals of a regression model which is used to detect areas of under- and over-prediction (CREATE_SPATIAL_COMPOSITE_SUPERVISED)

Additionally, a functionality to measure the internal consistency of the variables used to derive the spatial composite score is also available (CRONBACH_ALPHA_COEFFICIENT).

These procedures run natively on BigQuery and rely only on the resources allocated by the data warehouse.

In this guide, we show you how to use these functionalities with an example using a sample from CARTO Spatial Features for the city of Milan (Italy) at quadbin resolution 18, which is publicly available at `cartobq.docs.spatial_scoring_input`.

As an example, we have selected as variables of interest those that better represent the target population for a wellness & beauty center mainly aimed for teenage and adult women: the female population between 15 and 44 years of age (fempop_15_44); the number of relevant Points of Interests (POIs), including public transportation (public_transport), education (education), other relevant pois (pois) which are either of interests for students (such as universities) or are linked to day-to-day activities (such as postal offices, libraries and administrative offices); and the urbanity level (urbanity). Furthermore, to account for the effect of neighboring sites, we have smoothed the data by computing the sum of the respective variables using a k-ring of 20 for the population data and a k-ring of 4 for the POI data, as shown in the map below.

Additionally, the following map shows the average (simulated) change in annual revenue reported by all retail businesses before and after the COVID-19 pandemic. This variable will be used to identify resilient neighborhoods, i.e. neighborhoods with good outcomes despite a low target population.

The choice of the relevant data sources, as well as the imputation of missing data, is not covered by this set of procedures and should rely on the relevance of the indicators to the phenomenon being measured and of the relationship to each other, as defined by experts and stakeholders.

Computing a composite score

The choice of the most appropriate scoring method depends on several factors, as shown in this diagram

First, when some measurable outcome correlated with the variables selected to describe the phenomenon of interest is available, the most appropriate choice is the supervised version of the method, available through the CREATE_SPATIAL_COMPOSITE_SUPERVISED procedure. On the other hand, in case no such variable is available or its variability is not well captured by a regression model of the variables selected to create the composite score, the CREATE_SPATIAL_COMPOSITE_UNSUPERVISED procedure should be used.

Computing a composite score - unsupervised method

All methods included in this procedure involve a choice of a normalization function of the input variables in order to make them comparable, an aggregation function to combine them into one composite and a set of weights. As shown in the diagram above, the choice of the scoring method depends on the availability of expert knowledge: when this is available, the recommended choice for the scoring_method parameter is CUSTOM_WEIGHTS, which allows the user to customize both the scaling and the aggregation functions as well as the set of weights. On the other hand, when the choice of the individual weights cannot be based on expert judgment, the weights can be derived by maximizing the variation in the data, either using a Principal Component Analysis (FIRST_PC) when the sample is large enough and/or the extreme values (maximum and minimum values) are not outliers or as the entropy of the proportion of each variable (ENTROPY). Deriving the weights such that the variability in the data is maximized means also that largest weights are assigned to individual variables that have the largest variation across different geographical units (as opposed to setting the relative importance of the individual variable as in the CUSTOM_WEIGHTS method): although correlations do not necessarily represent the real influence of the individual variables on the phenomenon being measured, this is a desirable property for cross-unit comparisons. By design, both the FIRST_PC and ENTROPY methods will overemphasize the contribution of highly correlated variables, and therefore, when using these methods, there may be merit in dropping variables thought to be measuring the same underlying phenomenon.

When using the CREATE_SPATIAL_COMPOSITE_UNSUPERVISED procedure, make sure to pass:

The query (or a fully qualified table name) with the data used to compute the spatial composite, as well as a unique geographic id for each row
The name of the column with the unique geographic identifier
The prefix for the output table
Options to customize the computation of the composite, including the scoring method, any custom weights, the custom range for the final score or the discretization method applied to the output

The output of this procedure is a table with the prefix specified in the call with two columns: the computed spatial composite score (spatial_score) and a column with the unique geographic identifier.

Let’s now use this procedure to compute the spatial composite score for the available different scoring methods.

ENTROPY

The spatial composite is computed as the weighted sum of the proportion of the min-max scaled individual variables (only numerical variables are allowed), where the weights are computed to maximize the information (entropy) of the proportion of each variable. Since this method normalizes the data using the minimum and maximum values, if these are outliers, their range will strongly influence the final output.

With this query we are creating a spatial composite score that summarizes the selected variables (fempop_15_44, public_transport, education, pois).

CALL `carto-un`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT geoid, fempop_15_44, public_transport, education, pois FROM `cartobq.docs.spatial_scoring_input`',
'geoid', 
'<my-project>.<my-dataset>.<my-table>',
'''{
    "scoring_method":"ENTROPY",
    "bucketize_method":"JENKS",
    "nbuckets":6       
}'''
)

CALL `carto-un-eu`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT geoid, fempop_15_44, public_transport, education, pois FROM `cartobq.docs.spatial_scoring_input`',
'geoid', 
'<my-project>.<my-dataset>.<my-table>',
'''{
    "scoring_method":"ENTROPY",
    "bucketize_method":"JENKS",
    "nbuckets":6       
}'''
)

In the options section, we have also specified the discretization method (JENKS) that should be applied to the output. Options for the discretization method include: JENKS (for natural breaks) QUANTILES (for quantile-based breaks) and EQUAL_INTERVALS (for breaks of equal width). For all the available discretization methods, it is possible to specify the number of buckets, otherwise the default option using Freedman and Diaconis’s (1981) rule is applied.

To visualize the result, we can join the output of this query with the geometries in the input table, as shown in the map below.

SELECT a.spatial_score, a.geoid, b.geom
FROM `cartobq.docs.spatial_scoring_ENTROPY_results` a
JOIN `cartobq.docs.spatial_scoring_input` b
ON a.geoid = b.geoid

FIRST_PC

The spatial composite is computed as the first principal component score of a Principal Component Analysis (only numerical variables are allowed), i.e. as the weighted sum of the standardized variables weighted by the elements of the first eigenvector.

With this query we are creating a spatial composite score that summarizes the selected variables (fempop_15_44, public_transport, education, pois).

CALL `carto-un`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT geoid, fempop_15_44, public_transport, education, pois FROM `cartobq.docs.spatial_scoring_input`',
'geoid', 
'<my-project>.<my-dataset>.<my-table>',
'''{
    "scoring_method":"FIRST_PC",
    "correlation_var":"fempop_15_44",
    "correlation_thr":0.6,
    "return_range":[0.0,1.0]     
}'''
)

CALL `carto-un-eu`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT geoid, fempop_15_44, public_transport, education, pois FROM `cartobq.docs.spatial_scoring_input`',
'geoid', 
'<my-project>.<my-dataset>.<my-table>',
NULL,
'''{
    "scoring_method":"FIRST_PC",
    "correlation_thr":0.6,
    "return_range":[0.0,1.0]     
}'''
)

In the options section, the correlation_var parameter specifies which variable should be used to define the sign of the first principal component such that the correlation between the selected variable (fempop_15_44) and the computed spatial score is positive. Moreover, we can specify the (optional) minimum allowed correlation between each individual variable and the first principal component score: variables with an absolute value of the correlation coefficient lower than this threshold are not included in the computation of the composite score. Finally, by setting the return_range parameter we can decide the minimum and maximum values used to normalize the final output score.

Let’s now visualize the result in Builder:

CUSTOM_WEIGHTS

The spatial composite is computed by first scaling each individual variable and then aggregating them according to user-defined scaling and aggregation functions and individual weights. Compared to the previous methods, this method requires expert knowledge, both for the choice of the normalization and aggregation functions (with the preferred choice depending on the theoretical framework and the available individual variables) as well as the definition of the weights.

The available scaling functions are MIN_MAX_SCLALER (each variable is scaled into the range [0,1] based on minimum and maximum values); STANDARD_SCALER (each variable is scaled by subtracting its mean and dividing by its standard deviation); DISTANCE_TO_TARGET (each variable’s value is divided by a target value, either the minimum, maximum or mean value); PROPORTION (each variable value is divided by the sum total of the all the values); and RANKING (the values of each variable are replaced with their percent rank). More details on the advantages and disadvantages of each scaling method are provided in the table below

To aggregate the normalized data, two aggregation functions are available: LINEAR (the composite is derived as the weighted sum of the scaled individual variables multiple) and GEOMETRIC (the spatial composite is given by the product of the scaled individual variables, each to the power of its weight), as detailed in the following table:

In both cases, the weights express trade-offs between variables (i.e. how much an advantage on one variable can offset a disadvantage on another).

With the following query we are creating a spatial composite score by aggregating the selected variables, transformed to their percent rank, using the LINEAR method with the specified set of weights with sum equal or lower than 1: in this case, since we are not setting the weights for the variable public_transport, its weight is derived as the remainder.

CALL `carto-un`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT geoid, fempop_15_44, public_transport, education, pois, urbanity_ordinal FROM `cartobq.docs.spatial_scoring_input`',
'geoid', 
'<my-project>.<my-dataset>.<my-table>',
'''{
    "scoring_method":"CUSTOM_WEIGHTS",
    "scaling":"RANKING",
    "aggregation":"LINEAR",
"weights":{"fempop_15_44":0.4,"public_transport":0.2,"education":0.1,"urbanity_ordinal":0.2}  
}'''
)

CALL `carto-un-eu`.carto.CREATE_SPATIAL_COMPOSITE_UNSUPERVISED(
'SELECT geoid, fempop_15_44, public_transport, education, pois, urbanity_ordinal FROM `cartobq.docs.spatial_scoring_input`',
'geoid', 
'<my-project>.<my-dataset>.<my-table>',
'''{
    "scoring_method":"CUSTOM_WEIGHTS",
    "scaling":"RANKING",
    "aggregation":"LINEAR",
"weights":{"fempop_15_44":0.4,"public_transport":0.2,"education":0.1,"urbanity_ordinal":0.2}  
}'''
)

Let’s now visualize the result in Builder:

Computing a composite score - supervised method

This method requires a regression model with a response variable that is relevant to the phenomenon under study and can be used to derive a composite score from the model standardized residuals, which are used to detect areas of under- and over-prediction. The response variable should be measurable and correlated with the set of variables defining the scores (i.e. the regression model should have a good-enough performance). This method can be beneficial for assessing the impact of an event over different areas as well as to separate the contribution of the individual variables to the composite by only including a subset of the individual variables in the regression model at each iteration.

When using the CREATE_SPATIAL_COMPOSITE_SUPERVISED procedure, make sure to pass:

The query (or a fully qualified table name) with the data used to compute the spatial composite, as well as a unique geographic id for each row
The name of the column with the unique geographic identifier
The prefix for the output table
Options to customize the computation of the composite, including the TRANSFORM and OPTIONS clause for BigQuery ML CREATE MODEL statement, the minimum accepted R2 score, as well as the custom range or the discretization method applied to the output.

As for the unsupervised case, the output of this procedure consists in a table with two columns: the computed composite score (spatial_score) and a column with the unique geographic identifier.

Let’s now use this procedure to compute the spatial composite score from a regression model of the average change in annual revenue (revenue_change).

CALL `carto-un`.carto.CREATE_SPATIAL_COMPOSITE_SUPERVISED(
-- Input query
'SELECT geoid, revenue_change, fempop_15_44, public_transport, education, pois, urbanity FROM `cartobq.docs.spatial_scoring_input`', 
-- Name of the geographic unique ID
'geoid',
-- Output prefix  
'<my-project>.<my-dataset>.<my-table>',
'''{
    -- BigQuery model TRANSFORM clause parameters
    "model_transform":[
        "revenue_change",
        "fempop_15_44, public_transport, education, pois, urbanity"
    ],
    -- BigQuery model OPTIONS clause parameters
    "model_options":{
        "MODEL_TYPE":"LINEAR_REG",
        "INPUT_LABEL_COLS":['revenue_change'],
        "DATA_SPLIT_METHOD":"no_split",
        "OPTIMIZE_STRATEGY":"NORMAL_EQUATION",
        "CATEGORY_ENCODING_METHOD":"ONE_HOT_ENCODING",
        "ENABLE_GLOBAL_EXPLAIN":true
    },
    -- Additional input parameters   
    "r2_thr":0.4
}'''
)

CALL `carto-un-eu`.carto.CREATE_SPATIAL_COMPOSITE_SUPERVISED(
-- Input query
'SELECT geoid, revenue_change, fempop_15_44, public_transport, education, pois, urbanity FROM `cartobq.docs.spatial_scoring_input`', 
-- Name of the geographic unique ID
'geoid',
-- Output prefix  
'<my-project>.<my-dataset>.<my-table>',
'''{
    -- BigQuery model TRANSFORM clause parameters
    "model_transform":[
        "revenue_change",
        "fempop_15_44, public_transport, education, pois, urbanity"
    ],
    -- BigQuery model OPTIONS clause parameters
    "model_options":{
        "MODEL_TYPE":"LINEAR_REG",
        "INPUT_LABEL_COLS":['revenue_change'],
        "DATA_SPLIT_METHOD":"no_split",
        "OPTIMIZE_STRATEGY":"NORMAL_EQUATION",
        "CATEGORY_ENCODING_METHOD":"ONE_HOT_ENCODING",
        "ENABLE_GLOBAL_EXPLAIN":true
    },
    -- Additional input parameters   
    "r2_thr":0.4,
    "return_range":[-1.0,1.0]
}'''
)

Here, the model predictors are specified in the TRANSFORM (model_transform) clause (fempop_15_44, public_transport, education, pois, urbanity), which can also be used to apply transformations that will be automatically applied during the prediction and evaluation phases. If not specified, all the variables included in the input query, except the response variable (INPUT_LABEL_COLS) and the unique geographic identifier (geoid), will be included in the model as predictors. In the model_options section, we can specify all the available options for BigQuery CREATE MODEL statement for regression model types (e.g. LINEAR_REG, BOOSTED_TREE_REGRESSOR, etc.). Another available optional parameters in this procedure is the optional minimum acceptable R2 score (r2_thr, if the model R2 score on the training data is lower than this threshold an error is raised).

Let’s now visualize the result in Builder: areas with a higher score indicate areas where the observed revenues have increased more or decreased less than expected (i.e. predicted) and therefore can be considered resilient for the type of business that we are interested in.

Computing a composite score - internal consistency

Finally, given a set of variables, we can also compute a measure of the internal consistency or reliability of the data, based on Cronbach’s alpha coefficient. Higher alpha (closer to 1) vs lower alpha (closer to 0) means higher vs lower consistency, with usually 0.65 being the minimum acceptable value of internal consistency. A high value of alpha essentially means that data points with high (low) values for one variable tend to be characterized by high (low) values for the others. When this coefficient is low, we might consider reversing variables (e.g. instead of considering the unemployed population, consider the employed population) to achieve a consistent direction of the input variables. We can also use this coefficient to compare how the reliability of the score might change with different input variables or to compare, given the same input variables, the score’s reliability for different areas.

The output of this procedure consists in a table with the computed coefficient, as well as the number of variables used, the mean variance and covariance.

Let’s compute for the selected variables (fempop_15_44, public_transport, education, pois) the reliability coefficient in the whole Milan’s area

CALL `carto-un`.carto.CRONBACH_ALPHA_COEFFICIENT(
'SELECT fempop_15_44, public_transport, education, pois FROM cartobq.docs.spatial_scoring_input', 
'cartobq.docs.spatial_scoring_CRONBACH_ALPHA_results'
)

CALL `carto-un-eu`.carto.CRONBACH_ALPHA_COEFFICIENT(
'SELECT fempop_15_44, public_transport, education, pois FROM cartobq.docs.spatial_scoring_input', 
'cartobq.docs.spatial_scoring_CRONBACH_ALPHA_results'
)

The result shows that Cronbach’s alpha coefficient in this case is 0.76, suggesting that the selected variables have relatively high internal consistency.

CARTO Academy

CARTO Academy

Working with geospatial data

Building interactive maps

Creating workflows

Advanced spatial analytics

Get help

Working with geospatial data

Geospatial data: the basics

What is location data?

Types of location data

Raster data

Common raster file types

Vector data

Common vector file types

Shapefiles

Other vector file types

Everything in-between

Introduction to Spatial Indexes

Already a Spatial Indexes expert?

Spatial Indexes: the fundamentals

The advantages of working with Spatial Indexes

Choosing an index type

H3

Quadbin

S2

Which Spatial Index should I use?

Choosing a resolution

Keep learning...

Spatial Index support in CARTO

How Spatial Indexes are supported in CARTO

Spatial Indexes & our Analytics Toolbox

Visualizing Spatial Indexes in CARTO Builder

Visualizing point data as Quadbins

Zoom-based rendering

Controlling the resolution

Scaling common geoprocessing tasks with Spatial Indexes

Buffer

Clip/intersect

Difference

Spatial Join

Aggregate within a distance

Next up...

Using Spatial Indexes for analysis

Featured resources

Spatial Statistics

For your use case

The modern geospatial analysis stack

Building interactive maps

Data sources & map layers

Data sources

Adding sources to Builder

Map layers

Widgets & SQL Parameters

Widgets

Adding a Widget

Configuring a Widget

Widget Data

Widget Display

Widget Behavior

SQL Parameters

Using SQL Parameters

Add a SQL Query data source

Create and configure a date parameter

Create and configure a numeric parameter

Data visualization

Customize your visualization with tailored-made basemaps

Context

Creating a Style JSON using Maputnik

Hosting a Style JSON in Github

Adding custom basemaps to your organization

Creating a map using your custom basemap

Data analysis

Sharing and collaborating

Solving geospatial use-cases

Build a store performance monitoring dashboard for retail stores in the USA

Context

Step-by-Step Guide:

Agentic GIS

AI Agents