Using crime data & spatial analysis to assess home insurance risk

In this tutorial, we'll be using individual crime location data to create a crime risk index. This analysis is really helpful for insurers looking to make more intelligent policy decisions - from customized pricing of premiums to tailored marketing.

You will need...

  • Crime location data. We are using Los Angeles city (data available here) and Santa Monica city (data available here). Most local governments provide this data as open data, so you should be able to easily adapt this tutorial for your area of interest.

  • Detailed Population data. We are using USA 1km WorldPop data, which you can subscribe to via our Spatial Data Catalog.


Step 1: Sourcing & loading crime data

We’ll be basing our analysis on crime data for two adjacent locations; Los Angeles city (data available here) and Santa Monica city (data available here).

Our first step is to load this data into our data warehouse. To do this, head to the Data Explorer tab of your CARTO Workspace and select Import data, then follow the steps to import the two tables. For this dataset we are going to deselect the Let CARTO automatically define the schema option on Schema Preview so we can manually select the correct data types for each field. In this example, you want to be sure that latitude and longitude are defined as the type float.


Step 2: Creating a crime index

Now the data is loaded into our data warehouse, we’ll convert the crime locations into a hexagonal Spatial Index called H3. Note this process can be used to convert any point dataset into a H3 Index.

This workflow is comprised of the following steps:

  • First, drag your source crime data onto the canvas.

  • A Custom SQL Select first filters both input datasets so that they cover the same time period (we’ll use the last three full years - 2020, 2021 & 2022) and only crimes related to property risk (property damage, theft, arson etc) - you can use the code below as a basis for this, but may need to modify it if you are using different source data.

SELECT
  latitude AS lat,
  longitude AS lon
FROM
  $a
WHERE
  (DATE_OCCURRED LIKE '%2020%'
    OR DATE_OCCURRED LIKE '%2021%'
    OR DATE_OCCURRED LIKE '%2022%')
  AND UCR_DESCRIPTION IN ('Recv/Poss Stolen Property',
    'Larceny - From Vehicle',
    'Burglary-Force Non-Resd',
    'Burglary-Force Resd'...)
  • Union All merges the two input tables together.

  • ST_GeogPoint creates a point geometry for each crime.

  • These are then converted to H3 cells using H3 from GeoPoint - in this example we will use a resolution of 9, but you may wish to adapt this based on your use case and geography.

  • Finally, the Group by component is used to count any overlapping H3 cells i.e. cells where more than one crime has occurred. Set the following parameters:

    • Group by column: H3

    • Aggregation column: H3, Aggregation type: Count


Step 3: Contextualizing Crime Risk

In this section, we will contextualize the crime counts by calculating the number of crimes per 1,000 residents. First, we need to convert our population data into a H3 Index so we can use it in the same calculation as the crime count.

You can follow the steps in the video below to do this (also outlined below).

  1. Drag the WorldPop gridded population data onto your canvas.

  2. Use ST Centroid to convert each grid cell to a central point.

  3. Now, we will use a similar approach to when we converted the crime points to a H3 index. Use H3 from GeoPoint to convert each point geometry to a H3 index; make sure you set the resolution to 9 (the same as the crime count layer).

  4. Finally, use the Group by component to aggregate the index with the following parameters:

    1. Group by column: H3

    2. Aggregation column: Population, Aggregation type: Sum.

💡 At this point, we'd recommend using the Save as Table component to commit the results of this to a separate table to avoid unnecessarily querying the large WorldPop inpu.

Now you should have two inputs ready; crime counts and population. Let's bring them together!

  1. Add a Join component, with the population table (below, this is the "LOS_ANGELES_H..." source) as the top input, and and crime count table as the bottom input. Use a left join a type s we want to calculate the crime rate even for areas where no crimes have happened.

  2. Finally, we can calculate the crime rate! Normally you would just add a Create Column component to do this, but as we have some zeros and nulls in our data, we need to do an additional calculation using the Custom SQL Select component. Use the below SQL to calculate the number of crimes per year (of the 3 sample years) per 1,000 residents. Note how we can use the placeholder $a to call the attached component.

SELECT
    population_sum_sum,
    h3_joined AS h3,
    h3_count_joined,
    CASE
        WHEN h3_count_joined = 0 THEN 0
        WHEN population_sum_sum = 0 THEN (h3_count_joined / 3)
        ELSE (h3_count_joined / 3) / (population_sum_sum / 1000)
    END AS crime_rate
FROM
    $a

We can add this layer to the Builder map we created earlier, and use the Dual map view mode to compare the raw crime counts with the normalize crime data layer - check this out below! How do the two compare?


Further steps...

Want to take this analysis one step further? Let's explore what crime rates look like for individual properties.

Download Los Angeles County Property Parcels and import this into the same project that you have stored the crime data in. This data includes a helpful field on the price of each property when last sold).

Now we can add a final section onto the end of our workflow where we:

  1. Use a H3 from GeoPoint component to convert each property to a H3 index (remember to set this as resolution 9; the same as all of our other H3 layers).

  2. Join the crime rate data to the property H3 layer by the H3 columns in both tables.

Now we have a crime rate for each property! By converting our two input datasets - crime rate and properties - to H3 indexes, we are effectively replacing a heavy and slow spatial join operation with a fast and effective string-based join. The results? We now know the crime rates for 159 THOUSAND properties - and it took us seconds!

Learn more about this process in our blog Using crime data & spatial analysis to assess home insurance risk.

Last updated