Space-time anomaly detection for real-time portfolio management
Last updated
Was this helpful?
Last updated
Was this helpful?
In this tutorial, we’ll create a workflow to improve portfolio management for real estate insurers by identifying vacant buildings in areas experiencing anomalously high rates of violent crime.
By the end of this tutorial, you will have:
✅ Built a workflow to detect spatio-temporal emerging anomalous regions
✅ Prepared the results for interactive map visualization to monitor at-risk properties
Let's get started!
This is data that you'll need to run the analysis:
Crime counts: the cartobq.docs.CHI_crime_counts_w_baselines
public table reports the observed and expected counts for violent crimes in Chicago from 2001 to present. The individual crime data, which were extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system and is available in Google BigQuery public marketplace, where aggregated by week and H3 cell at resolution 8. The expected counts were obtained using a statistical model that accounts for external covariates as well as endogenous variables, including spatial lag variables to account for the influence of neighboring regions, counts at previous time lags to model the impact of past values on current or future outcomes, and seasonal terms to account for repeating seasonal behaviours,
Vacant buildings: the cartobq.docs.CHI_311_vacant_buildings_2010_2018
public table reports the 311 calls for open and vacant buildings reported to the City of Chicago since January 1, 2010.
That's all you need for now - let's get going!
Sign in to CARTO at app.carto.com
Choose the CARTO Data Warehouse connection or any connection to you Google BigQuery project.
For this method to work, we need first to ensure that the data is complete, i.e. that there are no weeks and/or H3 cells without data or with missing data. This can be easily verified by ensuring that each H3 cell has the same number of timesteps (and vice versa), as done in the first node where the Group By component is used to count the number of timestamps cells per H3 cell (and vice versa). This check allows us to verify that there are no gaps in the data. If gaps are detected, filling them is relatively straightforward for count data — it simply involves inserting zeros for the missing data points. However, for non-count variables, the process can be more complex. While simple techniques, like those available in Google Bigquery GAP_FILL function, might be a good initial approximation, more advanced modelling strategies are generally required.
Next, we add the Detect Space-time Anomalies component, which uses a multi-resolution method to search over a large and overlapping set of space-time regions, each containing some subset of the data, and find the most significant clusters of anomalous data. For a complete tutorial on how this method works, you can take a look at this guide.
We run this component with the following settings:
The index, data and variable column (h3
, week
, counts
)
The time frequency of the data, WEEK
for weekly data
That the analysis is of kind prospective
, meaning that we are interested in emerging anomalies, i.e. anomalies in the final part of the time series
The POISSON
distributional model, which is appropriate for count data
The EXPECTATION
estimation method, which assumes that the observed values should be equal to the baseline for non-anomalous space-time regions.
The spatial extent of the regions, with a k-ring between 2 and 3.
The temporal extent of the regions, with a window between 4 and 12 weeks.
The we are looking for high-mean anomalies
, i.e. we search for regions where the observed crimes are higher than expected.
The number of permutations to compute the statistical significance of the score.
The maximum number of results returned, that we set to 1 to select the most anomalous region only.
The output of the component is a table indexed by a unique identifier called index_scan
. Each identifier corresponds to a specific anomalous space-time region. For each region, the following information is provided: the anomalous score (score
, the higher the more anomalous), its statistical significance (gumbel_pvalue
), the relative risk (rel_risk
, which represents the ratio of the sum of the observed counts to the sum of the baseline counts), and the H3 cells (locations
) and weeks (times
), which are both stored as arrays.
To join the output from the component to the input table, which is indexed by the cell id and time, we need to first unnest the arrays. We then pivot the resulting table in order to obtain a table indexed by the H3 cell id and the week, with a 'key' column indicating either counts
or counts_baseline
and a 'value' column storing the corresponding count.
Finally, we join the results with a table containing 311 calls for open and vacant buildings reported to the City of Chicago between January 1, 2010 and December 2018: we first extract the distinct H3 cell in the space-time region using the Select Distinct component, then create a geometry column from the H3 Boundary component and finally use a Spatial Join component to intersect the tables based on their geometries.
Now let's turn this into something a bit more visual!
Select the Transpose / Unpivot as Table component. Open the Map preview at the bottom of the screen and select Create Map. This will take you to a fresh CARTO Builder map with your the H3 cells of the anomalous regions and their counts pre-loaded.
To also add the vacant buildings geometries, go back to the workflow and select the last Spatial Join component. Open the Map preview at the bottom of the screen and select Create Map. This will take you to a fresh CARTO Builder map with your data pre-loaded. Click on the three dots in the Sources panel, select the Query this table option and copy the code. Then go back to the first map and again in the Sources panel and click on Add source from layers, select the Add Custom Query (SQL) option and paste the SQL code. This will add to the map a layer with the vacant buildings within the anomalous region.
In the Layer panel, click on Layer 1 to rename the layer "Anomalous region" and style your data.
In the Layer panel, click on Layer 2 to rename the layer "Vacant buildings" and style your data.
To the right of the Layer panel, switch to the Widgets panel, to add a couple of dashboard elements to help your users understand your map. We’d recommend:
Time series widget: SUM, value, and Split By key - to show the total number of observed and expected counts by week.
For each of the widgets, scroll to the bottom of the Widget panel and change the behaviour from global to viewport, and watch as the values change as you pan and zoom.
Head to the Legend panel (to the right of Layers) to ensure the names used in the legend are clear (for instance we've changed the title of the legend from "Anomalous Region" to "Space-time region exhibiting an anomalous number of violent crimes").
Now, Share your map (top right of the screen) with your Organization or the public. Grab the shareable link from the share window.
Here's what our final version looks like:
Looking for tips? Head to the Data Visualization section of the Academy!
Head to the Workflows tab and select the Import Workflow icon and import this template.