Creating a composite score for fire risk

In this tutorial, we'll share a low code approach to calculating a composite score using Spatial Indexes. This approach is ideal for creating numeric indicators which combine multiple concepts. In this example, we'll be combining climate and historic fire extents to calculate fire risk - but you can apply these concepts to a wide range of scenarios - from market suitability for your new product to accessibility scores for a service that you offer.

You will need...

  • Climate data. Fires are most likely to start and spread in areas of high temperatures and high wind. We can access this information from our Spatial Features data - a global grid containing various climate, environmental, economic and demographic data. You can subscribe to this from the Data Observatory, or access the USA version of this data in the CARTO Data Warehouse.

  • USA Counties data. This can also be subscribed to from the Data Observatory, or accessed via the CARTO Data Warehouse.

  • Historic fires data. We’ll be using the LA County Historic Fires Perimeter data to understand areas where fires have been historically prevalent. You can download this data as a geojson here.

We’ll be creating the below workflow for this:


Step 1: Formatting the Spatial Features data

Before running our composite score analysis, we need to first filter the Spatial Features data to our area of interest (LA County). The climate data we are interested in is also reported at monthly levels, so we need to aggregate the variables to annual values.

We’ll be running this initial section of the workflow in this step.

  1. Set up First, in your CARTO Workspace, head to Workflows and Create a workflow, using the CARTO Data Warehouse connection.

  2. In the workflow, on the Sources panel (left of the screen), in the Connection panel you’ll see the CARTO Data Warehouse. Navigate to demo data > demo tables > usa_counties and derived_spatialfeatures_usa_h3res8_v1_yearly_v2. Drag these onto the canvas.

  3. Beside sources, switch to Components. Search for and drag a Simple Filter onto the canvas, then connect the usa_counties source to this. Set the name as equal to Los Angeles.

  4. Next, connect the Simple Filter to a H3 Polyfill component, ensuring the resolution is set to 8. This will create a H3 grid across LA, which we can use to filter the climate data to this area.

  5. Connect the H3 Polyfill output to the top input and the Spatial Features source to the bottom input of a Join component. Ensure both the main and secondary table join fields are set to H3 (this should autodetect), and then set the join type to Left. This will join only the features from the USA-wide Spatial Features source which are also found in the H3 polyfill component, i.e. only the cells in Los Angeles.

  6. Now, we want to use Create Column to create three new fields:

    1. Temp_max for the maximum temperature: greatest(tmax_jan_joined,tmax_feb_joined,tmax_mar_joined,tmax_apr_joined,tmax_may_joined,tmax_jun_joined,tmax_jul_joined,tmax_aug_joined,tmax_sep_joined,tmax_oct_joined,tmax_nov_joined,tmax_dec_joined)

    2. Temp_avg for the average temperature:(tavg_jan_joined + tavg_feb_joined + tavg_mar_joined + tavg_apr_joined + tavg_may_joined + tavg_jun_joined + tavg_jul_joined + tavg_aug_joined + tavg_sep_joined + tavg_oct_joined + tavg_nov_joined + tavg_dec_joined) / 12

    3. On a separate branch, Wind_avg for the average wind speeds: (wind_jan_joined + wind_feb_joined + wind_mar_joined + wind_apr_joined + wind_may_joined + wind_jun_joined + wind_jul_joined + wind_aug_joined + wind_sep_joined + wind_oct_joined + wind_nov_joined + wind_dec_joined) / 12

  7. Finally, connect each Create Column to a separate Select component. Here, you'll want to enter h3, wind_avg in one and h3, temp_avg, temp_max in the other.

Next up… our first composite score!


Step 2: PCA scoring

There are two main methods for calculating a composite score. Unsupervised scoring (which this tutorial will focus on) consists in the aggregation of a set of variables, scaled and weighted accordingly, , whilst supervised scoring leverages a regression model to relate an outcome of interest to a set of variables and, based on the model residuals, focuses on detecting areas of under and over-prediction. You can find out more about both methods and which to use when here, and access pre-built workflow templates here.

There are three main approaches to unsupervised scoring:

  • Principal Component Analysis (PCA): This method derives weights by maximizing the variation in the data. This process is ideal for when expert knowledge is lacking and the sample size is large enough, and extreme values are not outliers.

  • Custom Weights: Recommended to use for those with expert knowledge of their data and domain, this method allows users to customize both scaling and aggregation functions, along with defining a set of weights, enabling a tailored approach to scoring by incorporating domain-specific insights.

  • Entropy: By computing the entropy of the proportion of each variable, this method, like PCA, makes it ideal for those without expert domain knowledge.

Let’s employ PCA to create our first composite score!

This score represents the temperature-related risk and is obtained combining the effects of the average and maximum temperature. Because these two variables are highly correlated (heat extremes tend to happen in places where the average temperature is higher), the PCA method represents the best choice. With this method, the input variables are transformed into a single variable (the first principal score) that captures the maximum variability in the data, therefore effectively combining the key aspects of both variables.

  1. Drag a Composite Score Unsupervised component onto the canvas and connect it to the last Select component. Set the following parameters:

    1. Geographic identifier: h3

    2. Scoring method: FIRST_PC

    3. Correlation variable: temp_avg (this will be correlated against the other numeric variable: temp_max)

    4. Correlation threshold: leave as the default 0.2

    5. Output formatting: none - in other cases you can use this parameter to group the scores of your data by jenks, quantiles etc.

  2. Next, use a Join (inner) component to join the result of Composite Score Unsupervised to the output of Select (H3, wind_avg; Step 1, 6.c). Also, use a Drop Columns component to drop h3_joined. This will get us ready for the next step of this analysis.

⚡ Run! Once complete, select the map preview of the Composite Score Unsupervised component, and select Create map. Set the fill color to be determined by Spatial Score, and you should have something that looks like the below.

Areas with a lower score are areas where the average and maximum temperature is lower. These areas are typically located near the coast - such as Malibu and Long Beach - or in areas of dense vegetation. Given just the temperature information, we could make the initial judgment that these areas are less prone at risk to fires. But let’s add one final dimension to this.


Step 3: Using custom weights

  1. Locate the LA County Historic Fires Perimeter dataset from where you’ve downloaded it and drag it directly onto your workflow canvas. Alternatively, you can import it into your cloud data warehouse and drag it on via Sources.

  2. Like we did with the LA county boundary, use another H3 Polyfill (resolution 8) to create a H3 grid across the historic fires. Make sure you enable Keep input table columns; this will create duplicate H3 cells where multiple polygons overlap.

  3. With a Group by component, set the Group by column to H3 and the aggregation to H3 (COUNT) to count the number of duplicate H3 cells, i.e. the number of fires which have occurred in each area.

  4. Now, drag a Join onto the canvas; connect the Group by to the bottom input and the Composite score unsupervised to the top input. The join type should be Left and both input columns should be H3.

  5. Before we run our final composite score, we need to do a little bit of data cleaning:

    1. Do you see all those null values in the h3_count_joined column? We need to turn those into zeroes, indicating that no fires occurred in those locations. Add a Create Column component, and use the calculation coalesce(h3_count_joined,0) to do this.

    2. With a Drop Columns component, drop h3_joined and h3_count_joined. We want to only be left with fields that will be considered in our final composite score.

  6. Connect Drop Columns to a second Composite Score Unsupervised component. This time, use the Custom Weights method, and set the following parameters:

    1. Set the weights as: spatial_score = 0.25, wind_avg = 0.25, fire_count = 0.5. Alternatively choose your own weights to see how this affects the outcome!

    2. Leave the user-defined scaling as min-max and the aggregation function as linear, but change the output formatting to jenks. This will partition the results into classes based on minimizing within-class variance and maximizing between-class variance.

Once complete, head into the map preview and select Create map.

With historic fires and wind levels now factored into the score, we can now see a more complex picture of risk. For instance, risk is now considered much higher around Malibu, the location of the famous 2018 Woolsey fire.

Check out how we’ve used a combination of widgets & interactive pop ups to help our user interpret the map - head over to the Data visualization section to learn more about how you can do this!

pageData visualization

Last updated