LogoLogo
HomeDocumentationLoginTry for free
  • CARTO Academy
  • Working with geospatial data
    • Geospatial data: the basics
      • What is location data?
      • Types of location data
      • Changing between types of geographical support
    • Optimizing your data for spatial analysis
    • Introduction to Spatial Indexes
      • Spatial Index support in CARTO
      • Create or enrich an index
      • Work with unique Spatial Index properties
      • Scaling common geoprocessing tasks with Spatial Indexes
      • Using Spatial Indexes for analysis
        • Calculating traffic accident rates
        • Which cell phone towers serve the most people?
    • The modern geospatial analysis stack
      • Spatial data management and analytics with CARTO QGIS Plugin
      • Using data from a REST API for real-time updates
  • Building interactive maps
    • Introduction to CARTO Builder
    • Data sources & map layers
    • Widgets & SQL Parameters
    • AI Agents
    • Data visualization
      • Build a dashboard with styled point locations
      • Style qualitative data using hex color codes
      • Create an animated visualization with time series
      • Visualize administrative regions by defined zoom levels
      • Build a dashboard to understand historic weather events
      • Customize your visualization with tailored-made basemaps
      • Visualize static geometries with attributes varying over time
      • Mapping the precipitation impact of Hurricane Milton with raster data
    • Data analysis
      • Filtering multiple data sources simultaneously with SQL Parameters
      • Generate a dynamic index based on user-defined weighted variables
      • Create a dashboard with user-defined analysis using SQL Parameters
      • Analyzing multiple drive-time catchment areas dynamically
      • Extract insights from your maps with AI Agents
    • Sharing and collaborating
      • Dynamically control your maps using URL parameters
      • Embedding maps in BI platforms
    • Solving geospatial use-cases
      • Build a store performance monitoring dashboard for retail stores in the USA
      • Analyzing Airbnb ratings in Los Angeles
      • Assessing the damages of La Palma Volcano
    • CARTO Map Gallery
  • Creating workflows
    • Introduction to CARTO Workflows
    • Step-by-step tutorials
      • Creating a composite score for fire risk
      • Spatial Scoring: Measuring merchant attractiveness and performance
      • Using crime data & spatial analysis to assess home insurance risk
      • Identify the best billboards and stores for a multi-channel product launch campaign
      • Estimate the population covered by LTE cells
      • A no-code approach to optimizing OOH advertising locations
      • Optimizing site selection for EV charging stations
      • How to optimize location planning for wind turbines
      • Calculate population living around top retail locations
      • Identifying customers potentially affected by an active fire in California
      • Finding stores in areas with weather risks
      • How to run scalable routing analysis the easy way
      • Geomarketing techniques for targeting sportswear consumers
      • How to use GenAI to optimize your spatial analysis
      • Analyzing origin and destination patterns
      • Understanding accident hotspots
      • Real-Time Flood Claims Analysis
      • Train a classification model to estimate customer churn
      • Space-time anomaly detection for real-time portfolio management
      • Identify buildings in areas with a deficit of cell network antennas
    • Workflow templates
      • Data Preparation
      • Data Enrichment
      • Spatial Indexes
      • Spatial Analysis
      • Generating new spatial data
      • Statistics
      • Retail and CPG
      • Telco
      • Insurance
      • Out Of Home Advertising
      • BigQuery ML
      • Snowflake ML
  • Advanced spatial analytics
    • Introduction to the Analytics Toolbox
    • Spatial Analytics for BigQuery
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Spacetime hotspot classification: Understanding collision patterns
        • Time series clustering: Identifying areas with similar traffic accident patterns
        • Detecting space-time anomalous regions to improve real estate portfolio management (quick start)
        • Detecting space-time anomalous regions to improve real estate portfolio management
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Analyzing signal coverage with line-of-sight calculation and path loss estimation
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Find similar locations based on their trade areas
        • Calculating market penetration in CPG with merchant universe matching
        • Measuring merchant attractiveness and performance in CPG with spatial scores
        • Segmenting CPG merchants using trade areas characteristics
        • Store cannibalization: quantifying the effect of opening new stores on your existing network
        • Find Twin Areas of top-performing stores
        • Opening a new Pizza Hut location in Honolulu
        • An H3 grid of Starbucks locations and simple cannibalization analysis
        • Data enrichment using the Data Observatory
        • New police stations based on Chicago crime location clusters
        • Interpolating elevation along a road using kriging
        • Analyzing weather stations coverage using a Voronoi diagram
        • A NYC subway connection graph using Delaunay triangulation
        • Computing US airport connections and route interpolations
        • Identifying earthquake-prone areas in the state of California
        • Bikeshare stations within a San Francisco buffer
        • Census areas in the UK within tiles of multiple resolutions
        • Creating simple tilesets
        • Creating spatial index tilesets
        • Creating aggregation tilesets
        • Using raster and vector data to calculate total rooftop PV potential in the US
        • Using the routing module
      • About Analytics Toolbox regions
    • Spatial Analytics for Snowflake
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Opening a new Pizza Hut location in Honolulu
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
        • A Quadkey grid of stores locations and simple cannibalization analysis
        • Minkowski distance to perform cannibalization analysis
        • Computing US airport connections and route interpolations
        • New supplier offices based on store locations clusters
        • Analyzing store location coverage using a Voronoi diagram
        • Enrichment of catchment areas for store characterization
        • Data enrichment using the Data Observatory
    • Spatial Analytics for Redshift
      • Step-by-step tutorials
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
Powered by GitBook
On this page
  • Step 1 Understanding the data
  • Step 2 Detecting anomalous spikes in violent crimes in Chicago
  • Step 3 Explore all the options of the procedure

Was this helpful?

Export as PDF
  1. Advanced spatial analytics
  2. Spatial Analytics for BigQuery
  3. Step-by-step tutorials

Detecting space-time anomalous regions to improve real estate portfolio management (quick start)

Last updated 7 months ago

Was this helpful?

A more comprehensive version of this guide is available .

From , to , or , many applications require the monitoring of time series data in order to detect anomalous data points. In these event detection scenarios, the goal is to either uncover anomalous patterns in historical space-time data or swiftly and accurately detect emerging patterns, thereby enabling a timely and effective response to the detected events.

As a concrete example, in this guide we will focus on the task of detecting spikes in violent crimes in the city of Chicago in order to improve portfolio management of real estate insurers.

This guide shows how to use CARTO space-time anomaly detection functionality in the Analytics Toolbox for BigQuery. Specifically, we will cover:

  • A brief introduction to the method and to the formulations of the definition of anomalous, unexpected, or otherwise interesting regions

Step 1 Understanding the data

SELECT date, h3,  counts, total_pop_sum AS counts_baseline
FROM `cartobq.docs.chicago_crime_2024-07-30_enriched`
WHERE date > '2001-01-01'

Step 2 Detecting anomalous spikes in violent crimes in Chicago

To detect anomalies that affect multiple time series simultaneously, we can either combine the outputs of multiple univariate time series or treat the multiple time series as a single multivariate quantity to be monitored. However, for time series that are also localised in space, we expect that if a given location is affected by an anomalous event, then nearby locations are more likely to be affected than locations that are spatially distant.

A typical approach to the monitoring of spatial time series data uses fixed partitions, which requires defining an a priori spatial neighbourhood and temporal window to search for anomalous data. However, in general, we do not have a priori knowledge of how many locations will be affected by an event, and we wish to maintain high detection power whether the event affects a single location (and time), all locations (and times), or anything in between.

Depending on the type of anomalies that we are interested in detecting, different baselines can be chosen

  • Population-based baselines ('estimation_method':'POPULATION'). In this case we only have relative (rather than absolute) information about what we expect to see and we expect the observed value to be proportional to the baseline values. These typically represent the population corresponding to each space-time location and can be either given (e.g. from census data) or inferred (e.g. from sales data), and can be adjusted for any known covariates (such as age of population, risk factors, seasonality, weather effects, etc.)

A simple way of estimating the expected crime counts is to compute a moving average of the weekly counts for each H3 cell. For example, we could average each weekly value over the span between the previous and next three weeks

-- input_query
SELECT date, h3, 
counts, 
AVG(counts) OVER(PARTITION BY h3 ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as counts_baseline
FROM `cartobq.docs.chicago_crime_2024-07-30_enriched`
WHERE date > '2001-01-01'

Assuming that the counts are Poisson distributed (which is the typical assumption for count data, 'distributional_model':'POISSON'), we can obtain the space-time anomalies using the following query

CALL `carto-un`.carto.DETECT_SPACETIME_ANOMALIES(
-- input_query
''' <my_input-query>''',
-- index_column
'h3',
-- date_column
'date',
-- input_variable_column
'counts',
-- time_freq
'Week',
-- output_table
'<my-project>.<my-dataset>.<my-output_table>',
-- options
'''{
    'kring_size':[1,3],
    'time_bw':[4,16],
    'is_prospective': false,
    'distributional_model':'POISSON',
    'permutations':99,
    'estimation_method':'EXPECTATION'
}'''
)
CALL `carto-un-eu`.carto.DETECT_SPACETIME_ANOMALIES(
-- input_query
''' <my_input-query>''',
-- index_column
'h3',
-- date_column
'date',
-- input_variable_column
'counts',
-- time_freq
'Week',
-- output_table
'<my-project>.<my-dataset>.<my-output_table>',
-- options
'''{
    'kring_size':[1,3],
    'time_bw':[4,16],
    'is_prospective': false,
    'distributional_model':'POISSON',
    'permutations':99,
    'estimation_method':'EXPECTATION'
}'''
)

The map below shows the spatial and temporal extent of the ten most anomalous regions (being the region with rank 1, the most anomalous), together with the time series of the sum of the counts and baselines (i.e. the moving average values) for the time span of the selected region

Step 3 Explore all the options of the procedure

How to identify anomalous space-time regions using the function

By the end of this guide, you will have detected anomalous space-time regions in time series data of violent crimes in the city of Chicago. A more comprehensive version of this guide can be found .

Crime data is often an overlooked component in property risk assessments and rarely integrated into underwriting guidelines, despite the FBI's latest indicating over $16 billion in losses annually from property crimes only. In this example, we will use the locations of violent crimes in Chicago available in , extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. The data are available daily from 2001 to present, minus the most recent seven days, which also allows to showcase how to use this method to detect space-time anomalies in almost-real-time.

For the purpose of this guide, the data were first aggregated weekly (by assigning each daily data to the previous Monday) and by at resolution 7, as shown in this map, where we can visualize the total counts for the whole period by H3 cell and the time series of the H3 cells with most counts

Each H3 cell has been further enriched using demographic data from the at the census block resolution. Finally, each time series has been to remove any gap by assigning a zero value to the crime counts variable. The final data can be accessed using this query

A solution to this problem is a multi-resolution approach in which we search over a large and overlapping set of space-time regions, each containing some subset of the data, and find the most significant clusters of anomalous data. This approach, which is known as the , consists of computing a score function that compares the probability that a space-time region SSS is anomalous compared to some baseline to the probability of no anomalous regions. The region(s) with the highest value of the score for which the result is significant for some significance level are identified as the (most) anomalous.

Expectation-based baselines ('estimation_method':'EXPECTATION'). Another way of interpreting the baselines, is to assume that the observed values should be equal (and not just proportional as in the population-based approach) to the baseline under the null hypothesis of no anomalous space-time regions. This approach requires an estimate of the baseline values which are inferred from the historical time series, potentially adjusting for any relevant external effects such as day-of-week and seasonality. Such estimate can be derived from a moving window average or a counterfactual forecast obtained from time series analysis of the historical data, as can be for example obtained by fitting an Arima model to the historical data using the or the model classes in .

As we can see from the query above, in this case we are looking retrospectively for past anomalous space-time regions ('is_prospective: false', i.e. the space-time anomalies can happen at any point in time over all the past data as opposed to emerging anomalies for which the search focuses only on the final part of the time series) with spatial extent with a ('kring_size') between 1 (first order neighbours) and 3 (third order neighbours) and a temporal extent ('time_bw') between 2 and 16 weeks. Finally, the 'permutations' parameter is set to define the number of permutations used to compute the statistical significance of the detected anomalies.

To explore the effect of choosing different baselines and parameters check the of this guide, where the method is described in more detail and we offer step-by-step instructions to implement various configurations of the procedure.

DETECT_SPACETIME_ANOMALIES
here
estimates
BigQuery public marketplace
H3 cell
American Community Survey (ACS)
gap filled
generalized space-time scan statistics framework
ARIMA_PLUS
ARIMAS_PLUS_XREG
Google BigQuery
k-ring
extended version
here
disease surveillance systems
detect spikes in network usage
environmental monitoring systems