Understanding accident hotspots

In this tutorial, we’ll be exploring which parts of Paris’ cycle network could most benefit from improved safety measures through exploring accident rates.

You'll need...

This analysis will be based on two datasets; accident locations and the Paris cycle network.

To access the data:

In Snowflake, you can find PARIS_BIKE_ACCIDENTS and PARIS_CYCLING_NETWORK in the CARTO Academy Data listing on the Snowflake Marketplace.
Other clouds:
- Accident locations can be downloaded from here, and dropped directly into your workflow (more on that later).
- The cycling network can be sourced from OpenStreetMap; you can follow our guide for accessing data from this source here. Alternatively, you can find this in the CARTO Data Warehouse > demo data > demo tables > paris_cycling_network.

If you'd like to replicate this analysis for another study area, many local government data hubs will publish similar data on accident locations.

Step-by-Step tutorial

Creating a Workflow

In the CARTO Workspace, head to Workflows and Create a Workflow, using the connection where your data is stored.
Under Sources (to the left of the screen), locate Paris bike accidents & Paris Cycling Network and drag them onto the canvas. If any of your source files are saved locally (for instance, if you downloaded the accident data from this link, you can drag and drop the files from your Downloads folder directly onto the canvas. This may take a few moments as this is a large dataset!

#1 Convert accidents to a H3 grid & filter to a study area

A workflow showing how to create a H3 grid of Paris cycling accidents

First, we'll create a study area. On the left of the screen, switch from the Sources to the Components panel, which is where you can find all of your processing and analytical tools. Locate the Draw Custom Features component and drag it onto the canvas. Select the component to open the component options on the right hand side of the window. Click Draw Features and draw a custom area around the Paris area (see below). 💡 Alternatively of drawing a custom polygon, you can use any polygon table to define your custom area.

Back in the Components panel, locate the H3 Polyfill component. And connect the output of draw features to it (see screenshot above). We will use this to create a hexagonal H3 Spatial Index grid across our custom study area. Change the resolution to 10 which is more detailed than the default 8.
Run your workflow! Note you can do this at any time, and only components which you have edited will be re-run.
Now let's turn our attention to the bike accidents. Back in the Components panel, locate H3 from GeoPoint and drag it onto the canvas. Connect this to your bike accidents source, and set a resolution of 10.
Next, use a Join component to essentially filter the accidents. Set the H3 Polyfill (step 2) as the top input, and the H3 from GeoPoint as the bottom input, the join columns as H3, and set the join type as Inner. Check the screenshot above for guidance.

Now might be a good time to add an annotation note around this section of the Workflow to keep it organized. You can do this by clicking Add a note (Aa) at the top of the screen.

#2 Aggregate & calculate hotspots

A screenshot of a workflow showing how to calculate hotspots with a H3 grid

Now we can start analyzing our data!

Connect a Group by component to the output of the Join we just created. Set the group by column to H3 and the aggregation column H3 and type count. This will result in a hexagonal grid with a field H3_count which holds the number of accidents which have occured in each "cell."
Next, connect this to a Getis Ord* component. This will be used to calculate spatial hotspots; statistically significant clusters of high data values. Set the following parameters:
1. Index column: H3
2. Value column: H3_Count
3. Kernel function: Triangular (this means cells closer to the central cell have a far higher weight).
4. Size: 3 (the neighborhood size).

For more information on these parameters, check out this blog.

How to calculate spatial hotspots and which tools do you need?

Finally, use two connected Simple Filter components with the following conditions:
1. p_value <= 0.1, meaning we can be 90% confident that the outputs are spatial hotspots.
2. GI > 0, meaning there is a cluster of high values (with negative values representing clusters of low values).

❗If you are using Google BigQuery, at this stage you will need to rename the Index column "H3" so that we can map it. Use a Create Column component to do this.

Now you have a column named H3, we're ready to map!

Expand the Results panel at the bottom of the window and switch to the Map tab. With your second Simple Filter selectively (or Create Column, if in BigQuery), select Create Map.

Note that you can do this with any component in your workflow as long as it has either a geometry or Spatial Index reference column. However, the results of every component are only saved for 30 days, so if there is one you'd like to use beyond this period, make sure to use a Save as Table component to commit it.

Let's start to explore our data in CARTO Builder!

Rename your map "Paris accident hotspots" by clicking on the existing name (likely "Untitled") at the top-left of the window.
Change basemaps: still in the top-left of the window, switch from the Layers to the Basemaps tab. You can choose any you like; we'll go with Google Maps: Dark Matter.
Rename the layer: back in the Layers tab, click on your layer to expand the layer options. Click on the three dots to the right of the layer name (likely "Layer 1") to rename it "Accident hotspots."
Style the layer: still in the layer options...
1. Change the resolution to 6 so we can see a more detailed view of the data.
2. Disable the stroke color (it'll end up being "noisy" later on).
3. In the fill color options, set the color to be based on GI (AVG) and select a color palette; we're using sunset dark. For a more impactful map, reverse the color palette so that the lightest color represents the largest value. Change the color scale to quantile.
Set a blending mode: come out of the layer options so you're in the main Layers panel. To the top-right of the panel, set the Layer blending to additive. This means that layering lighter colors on top of each other will result in an even lighter color. At the moment, that just means that we can see our basemap a little clearer... but just you wait!

Right now, your map is probably looking a little something like...

Let's kick this up a gear! Head back to your workflow for the next step.

#3 Convert the cycle network to a H3 grid

A screenshot of a workflow showing how to convert a linear feature to a H3 grid

To transform these hotspots into actionable insights, we’ll now work out which parts of the cycle network infrastructure fall within accident hotspots - and so could benefit from some targeted improvements. Rather than using a slower spatial join to do this, we’ll leverage H3 again.

First, connect an ST Buffer component to the cycling network source, setting a distance of 25 meters.
Next connect this to a H3 Polyfill component (resolution 10) again to convert these to a H3 grid - at this stage, we’ll make sure to enable “Keep table input columns.”

#4 Filter network to accident hotspots

Now we'll use another Join to join our cycle network H3 grid to the results of our hotspot analysis. Use the result of "#2 Aggregate & calculate hotspots" as the top input, and the result of H3 Polyfill as the bottom input. The join columns should both be H3, and the join type should be Inner.
Now we will calculate the average GI* score for each section of the cycle network to determine which part of the network is covered by the strongest hotspots. Use one final Group by with the following parameters:
1. Group by column: CARTODB_ID
2. Aggregation: GI (AVG), HIGHWAY (ANY), NOM_VOIE_JOINED (ANY) & GEOM_JOINED (ANY). You can also use an ANY aggregation to retain any contextual information from the cycle links, such as highway name.

Connect this final Group by to a Save as Table component to commit the results.

Now we have a table consisting of cycle links which are in an accident hotspot, as well as their respective average GI* score which indicates the strength of the hotspot. You can see the full workflow below.

Building the map

Let's bring everything together into one final map 👇

Head back to Paris accident hotspots map you created earlier.
First, let's add in the cycle links with GI* scores that we just created. In the bottom left of your map, navigate through Sources > Add Source from > Data Explorer > the cycle links table you just created. Add it to the map, and let's style it!
1. Rename the layer: GI* score by link
2. Stroke color based on: GI_AVG. We've used the same color palette as the hotspot grid as earlier (Sunset Dark, inverted) with a Quantile scale.
3. Stroke width: 3.5
To help give more weight to our analysis, let's also add in the original accident locations. Navigate again through Sources > Add Source from > Data Explorer to where you originally accessed the data. If you imported the accidents as a local file through Workflows, you can use a Save as Table component here to commit them to a table on the cloud. Now let's style them:
1. Rename the layer: Accidents
2. Fill color: orange, opacity = 1.
3. Stroke: disabled
4. Radius: 1
Looking to replicate that "glowy" effect? This is what's known as a "firefly map" and is super easy to replicate"
1. In the layers panel, click on the three dots next to the Accidents layer and + Duplicate layer.
2. Drag this layer to beneath the original accidents layer.
3. Set the radius to 4 and opacity to 0.01.
4. So it isn't confusing for your users, head to the Legend tab (to the right of Layers) and disable the copied layer in the legend. You can also change the names of layers and classes here.
Now finally let's add some widgets to help our user explore the data. To the right of the Layers tab, open the Widgets tab. Add the following widgets:
1. Number of accidents:
  1. Layer: Accidents
  2. Widget type: formula
  3. Name: Number of accidents
  4. Formatting: Integer with format separator (12,345,678)
2. GI* by highway type:
  1. Layer: GI* score by link
  2. Widget type: category
  3. Name: GI* score by highway type
  4. Operation: average
  5. Column: HIGHWAY_JOINED_ANY
  6. Aggregation column: GI_AVG
  7. Formatting: 2 decimal places (1.23)
3. GI* by highway:
  1. Layer: GI* score by link
  2. Widget type: category
  3. Name: GI* score by street
  4. Operation: average
  5. Column: HIGHWAY_JOINED_ANY
  6. Aggregation column: GI_AVG
  7. Formatting: 2 decimal places (1.23)

Now your user should be able to use your map to pinpoint which streets could benefit from targeted safety improvements - such as Rue Malher with a GI* score of 11.98, and 81 accidents in close proximity.