Using crime data & spatial analysis to assess home insurance risk

In this tutorial, we'll be using individual crime location data to create a crime risk index. This analysis is really helpful for insurers looking to make more intelligent policy decisions - from customized pricing of premiums to tailored marketing.

You will need...

  • Crime location data. We are using data for Los Angeles city (data available here). Most local governments provide this data as open data, so you should be able to easily adapt this tutorial for your area of interest.

  • Detailed Population data. We are using WorldPop 100m gridded data, which you can subscribe to via our Spatial Data Catalog.


Step 1: Sourcing & loading crime data

We’ll be basing our analysis on crime data for Los Angeles city (data available here).

First, let's load this data into your data warehouse. To do this, head to the Data Explorer tab of your CARTO Workspace:

  1. Select Import data, then follow the steps to import the table.

  2. For this dataset we are going to deselect the Let CARTO automatically define the schema option on Schema Preview so we can manually select the correct data types for each field. In this example, you want to be sure that latitude and longitude are defined as the type float64.


Step 2: Creating a crime grid

Now the data is loaded into our data warehouse, we’ll be building the below workflow to convert the crime locations into a hexagonal Spatial Index called H3. This process can be used to convert any point dataset into a H3 Index.

  1. With the data downloaded, head to the Workflows tab of the CARTO Workspace and select + New Workflow. Use the connection relevant to the location you loaded your data to. Select Create Workflow.

  2. At the top left of the screen, click on the word Untitled to rename your workflow Crime risk.

  3. You should now be seeing a blank Workflows canvas. The first thing we need to do is load our crime data in. To the left of the window, open the Sources tab. Navigate through Connection data to the LA crime locations table you just imported, and drag it onto the canvas.

  4. If you navigate through the Table preview (bottom of the window) you'll notice we don't have a geometry column. Let's change that! Switch from the Sources to Components window, and search for ST GeogPoint; we'll use this to create a point geometry for each crime. Drag this component onto the canvas to the right of the crimes source.

  5. Connect the right-hand node of the crimes table to the input, left-hand node os ST GeogPoint (this may happen automatically if they're placed close together. Set the latitude and longitude columns as lat and lon respectively - and run the workflow!

At the bottom of the window, select ST GeogPoint and open the Table preview again. Scroll right to the end and select Show Column Stats.

Notice anything weird? The minimum latitude and maximum longitude values are both 0 - which means we have a series of features which are incorrectly sitting in "null island" i.e. longitude, latitude = 0,0. These will skew our subsequent analysis, so let's remove them.

  1. Back in Components, find Simple Filter. Drag this onto the canvas, connecting it to the output of ST Geogpoint. Set the filter condition to latitude does not equal 0, and run. Now let's get on with running our analysis.

  2. Now, let's also filter the data to only crimes relevant to home insurance risk. Connect the Simple Filter to a Select Distinct component, looking at the column crm_cd_desc. You can see there are over 130 unique crime codes which we need to filter down.

  3. For this filter, as we will have multiple criteria we will instead need to connect a Where component to the Simple Filter from step 5. In this Where component, copy and paste the following:

crm_cd_desc LIKE '%BURGULARY%' OR 
crm_cd_desc LIKE '%THEFT%' OR 
crm_cd_desc LIKE '%VANDALISM%' OR 
crm_cd_desc LIKE '%STOLEN%' OR 
crm_cd_desc LIKE '%ARSON%'
  1. Connect your Simple Filter to a H3 from GeoPoint component, which we'll use to convert each crime to a hexagonal H3 grid cell. Change the resolution to 9 which is slightly more detailed than the default 8.

  2. In the final step for this section, connect the H3 from GeoPoint component to a Group by component. Set the column as H3 and the aggregation as H3 again, with the type COUNT. This will count all duplicate cells, turning our H3 grid into a frequency grid.

You can now select the Group by component, open the Map preview tab and select Create Map to start exploring your data - here's what ours looks like! Make sure you check out the Building Interactive Maps section of the academy for tutorials on crafting the most impactful maps!


Step 3: Contextualizing Crime Risk

In this section, we will contextualize the crime counts by calculating the number of crimes per 1,000 residents. First, we need to convert our population data into a H3 Index so we can use it in the same calculation as the crime count.

You can follow the steps in the video below to do this (also outlined below).

  1. If you haven't already, head to the Data Observatory and subscribe to Population Mosaics, 2020 - United States of America (Grid 100m).

  2. In your workflow, head to Sources > Data Observatory > WorldPop and drag the gridded population data onto your canvas. You may need to refresh your workflow if you subscribed to the dataset since you started building.

  3. Connect this to an ST Centroid to convert each grid cell to a central point.

  4. Now, we will use a similar approach to when we converted the crime points to a H3 index. Use H3 from GeoPoint to convert each point geometry to a H3 index; make sure you set the resolution to 9 (the same as the crime count layer).

  5. Finally, use the Group by component to aggregate the index with the following parameters:

    1. Group by column: H3

    2. Aggregation column: Population, Aggregation type: Sum.

Altogether, this should look something like the below (note how we've used an annotation box to help organize or workflow - you can access these in the Aa button at the top of the window).

Now you should have two inputs ready; crime counts and population. Let's bring them together!

  1. Add a Join component, with the Group by component from the previous step as the top (main) input, and and crime count Group by table as the bottom input. Use an inner join a type with the columns from both tables as h3.

  2. Finally, we can calculate the crime rate! Add a Create Column component to do this and input the below formula.

CASE WHEN population_sum_joined = 0 then 0 ELSE h3_count/(population_sum_joined/1000) END

  1. Use a Save as Table component to commit it.

Altogether, your workflow should be looking something like...

  1. Head back to the Builder map you created earlier. Under Sources (bottom left), select Add Source from > Data Explorer and navigate to where you saved your table. Add it to the map!

  2. Rename the layer Crime rate.

  3. Let's style both the Crime count and rate layers in turn by clicking on the layer name in the Layer panel:

    1. Reduce the resolutions of both to 6 (as detailed as possible)

    2. Disable the strokes

    3. Change the fill colors to be determined by H3_Count (average) for the crime count layer, and crime_rate (average) for the crime rate layer. Pro tip - use different color schemes for both layers, so it's obvious to the user that they aren't directly comparable.

  4. At the top of the Builder window in Map views, turn on Switch to a dual map view. Open the legends for each map respectively (at the bottom right of each window) and turn the Crime rates off for the left-hand map and Crime counts off for the right-hand map (or the other way around! You basically only want to see one grid in each map).

Check this out below! How do the two compare?


Next steps...

Want to take this analysis one step further? Here are some ideas for next steps:

  • Calculate crime rate hotspots and outliers with our Statistics tools

  • Assess property-level home risk by joining your results to property data, such as Overture Maps buildings

Learn more about this process in our blog Using crime data & spatial analysis to assess home insurance risk.

Last updated