> For the complete documentation index, see [llms.txt](https://academy.carto.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/time-series-clustering-identifying-areas-with-similar-traffic-accident-patterns.md). # Time series clustering: Identifying areas with similar traffic accident patterns

Spatio-temporal analysis is crucial in extracting meaningful insights from data that possess both spatial and temporal components. By incorporating spatial information, such as geographic coordinates, with temporal data, such as timestamps, spatio-temporal analysis unveils dynamic behaviors and dependencies across various domains. This applies to different industries and use cases like car sharing and micromobility planning, urban planning, transportation optimization, and more. \ In this example, we will perform spatio-temporal analysis to identify areas with similar traffic accident patterns over time using the location and time of accidents in London in 2021 and 2022, provided by [Transport for London](https://tfl.gov.uk/corporate/publications-and-reports/road-safety#on-this-page-1). This tutorial builds upon [this previous one,](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/space-time-hotspot-analysis-identifying-traffic-accident-hotspots) where we explained how to use [the spacetime Getis-Ord functionality](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/statistics#getis_ord_spacetime_h3_table) to identify traffic accident hotspots. ## Data The source data we use has two years of weekly aggregated data into an H3 grid, counting the number of collisions per cell. The data is available at `cartobq.docs.spacetime_collisions_weekly_h3` and it can be explored in the map below. {% embed url="" %} ## Spacetime Getis-Ord We start by performing a spacetime hotspot analysis to better understand our data. We can use the following call to the Analytics Toolbox to run the procedure: {% tabs %} {% tab title="carto-un" %} ```sql CALL `carto-un`.carto.GETIS_ORD_SPACETIME_H3_TABLE( 'cartobq.docs.spacetime_collisions_weekly_h3', 'cartobq.docs.spacetime_collisions_weekly_h3_gi', 'h3', 'week', 'n_collisions', 3, 'WEEK', 1, 'gaussian', 'gaussian' ); ``` {% endtab %} {% tab title="carto-un-eu" %} ```sql CALL `carto-un-eu`.carto.GETIS_ORD_SPACETIME_H3_TABLE( 'cartobq.docs.spacetime_collisions_weekly_h3', 'cartobq.docs.spacetime_collisions_weekly_h3_gi', 'h3', 'week', 'n_collisions', 3, 'WEEK', 1, 'gaussian', 'gaussian' ); ``` {% endtab %} {% tab title="manual" %} ```sql CALL carto.GETIS_ORD_SPACETIME_H3_TABLE( 'cartobq.docs.spacetime_collisions_weekly_h3', 'cartobq.docs.spacetime_collisions_weekly_h3_gi', 'h3', 'week', 'n_collisions', 3, 'WEEK', 1, 'gaussian', 'gaussian' ); ``` {% endtab %} {% endtabs %} For further detail on the spacetime Getis-Ord, take a look at [the documentation](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/statistics#getis_ord_spacetime_h3_table) and [this tutorial](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/space-time-hotspot-analysis-identifying-traffic-accident-hotspots). By performing this analysis, we can check how different parts of the city become “hotter” or “colder” as time progresses. {% embed url="" %} ## Finding time series clusters Once we have an initial understanding of the spacetime patterns of our data, we proceed to cluster H3 cells based on their temporal patterns. To do this, we use the [TIME\_SERIES\_CLUSTERING](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/statistics#time_series_clustering) procedure, which takes as input: * `input`: The query or fully qualified name of the table with the data * `output_table`: The fully qualified name of the output table * `partitioning_column`: Time series unique IDs, which in this case are the H3 indexes * `ts_column`: Name of the column with the value per ID and timestep * `value_column`: Name of the column with the value per ID and timestep * `options`: A JSON containing the advanced options for the procedure One of the advanced options is the time series clustering method. Currently, it features two basic approaches: * **Value characteristic** that will cluster the series based on the step-by-step distance of its values. One way to think of it is that the closer the signals, the closer the series will be understood to be and the higher the chance of being clustered together. * **Profile characteristic** that will cluster the series based on their dynamics along the time span passed. This time, the closer the correlation between two series, the higher the chance of being clustered together. Clustering the series as-is can be tricky since these methods are sensitive to the noise in the series. However, since we smoothed the signal using the spacetime Getis-Ord before, we could try clustering the cells based on the resulting temperature. We will only consider those cells with at least 60% of their observations with reasonable significance. {% tabs %} {% tab title="carto-un" %} ```sql CALL `carto-un`.carto.TIME_SERIES_CLUSTERING( ''' SELECT * FROM `cartobq.docs.spacetime_collisions_weekly_h3_gi` QUALIFY PERCENTILE_CONT(p_value, 0.6) OVER (PARTITION BY index) < 0.05 ''', 'cartobq.docs.spacetime_collisions_weekly_h3_clusters', 'index', 'date', 'gi', JSON '{ "method": "profile", "n_clusters": 4 }' ); ``` {% endtab %} {% tab title="carto-un-eu" %} ```sql CALL `carto-un-eu`.carto.TIME_SERIES_CLUSTERING( ''' SELECT * FROM `cartobq.docs.spacetime_collisions_weekly_h3_gi` QUALIFY PERCENTILE_CONT(p_value, 0.6) OVER (PARTITION BY index) < 0.05 ''', 'cartobq.docs.spacetime_collisions_weekly_h3_clusters', 'index', 'date', 'gi', JSON '{ "method": "profile", "n_clusters": 4 }' ); ``` {% endtab %} {% tab title="manual" %} ```sql CALL carto.TIME_SERIES_CLUSTERING( ''' SELECT * FROM `cartobq.docs.spacetime_collisions_weekly_h3_gi` QUALIFY PERCENTILE_CONT(p_value, 0.6) OVER (PARTITION BY index) < 0.05 ''', 'cartobq.docs.spacetime_collisions_weekly_h3_clusters', 'index', 'date', 'gi', JSON '{ "method": "profile", "n_clusters": 4 }' ); ``` {% endtab %} {% endtabs %} Even if it can feel like some layers of indirection, this provides several advantages: * Since it has been temporally smoothed, noise has been reduced in the dynamics of the series; * and since it has been geographically smoothed, nearby cells are more likely to be clustered together. This map shows the different clusters that are returned as a result: {% embed url="" %} We can immediately see the different dynamics in the widget: * Apart from cluster #3, which clearly clumps the “colder” areas, the rest start 2021 with very similar accident counts. * However, from July 2021 onwards, cluster #2 accumulates clearly more collisions than the other two. * Even though #1 and #4 have similar levels, certain points differ, like September 2021 or January 2022. This information is incredibly useful to kickstart a further analysis to understand the possible causes of these behaviors, and we were able to extract these insights at a single glance at the map. This method “collapsed” the results of the space-time Getis-Ord into a space-only result, which makes the data easier to explore and understand.
--- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/time-series-clustering-identifying-areas-with-similar-traffic-accident-patterns.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.