Using data from a REST API for real-time updates

In addition to easily subscribing to data on the cloud via the CARTO Data Observatory, another way you can easily access spatial data is via API.

Data is increasingly being published via API feeds rather than static download services. By accessing data this way, you can benefit from live feeds and reduce data storage costs.

In this tutorial, we will walk through how to import data from an external REST API into CARTO Workflows.

What are we aiming for? We're going to extract data from the US Current Air Quality API and map it. Then we're going to keep doing that, every hour (at least for a while), so we can monitor those changes over time - but you won't have to lift a finger - once you've set up your workflow, that is! By the end of it, you'll end up with something that looks like this 👇

All the data we'll be using here is free and openly available - so all you need is your CARTO account.


Step 1: Accessing data from an API

We're going to be using CARTO Workflows to make this whole process as easy as possible.

  • Sign into the CARTO platform and head to the Workflows tab.

  • Create a new Workflow using any connection - you can also use the CARTO Data Warehouse here.

  • Open the Components tab (on the left of the window) and search for the Import from URL component. Drag it onto the canvas.

  • Open the US Current Air Quality API page on the ArcGIS hub. Scroll down until you see View API Resources on the right. Expand this section and copy the URL from the GeoJSON section (it should look like the below), pasting it into your Import from URL component.

https://services.arcgis.com/cJ9YHowT8TU7DUyn/arcgis/rest/services/Air%20Now%20Current%20Monitor%20Data%20Public/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson
  • Note that the Import from URL component requires you to run the import before proceeding to further workflow steps - so let's Run! Once complete, you should be able to select the component to view the data, just like with any other component.

This is pretty much the most straightforward API call you can make to access spatial data - things can obviously get much more complicated!

First, let's say we want to only return a handful of fields. We would do this by replacing the outFields=* portion of the URL with a list of comma-separated field names, like below.

https://services.arcgis.com/cJ9YHowT8TU7DUyn/arcgis/rest/services/Air%20Now%20Current%20Monitor%20Data%20Public/FeatureServer/0/query?where=1%3D1&outFields=AQSID,Latitude,Longitude,PM25_AQI,LocalTimeString&outSR=4326&f=geojson

Next, let's image we only want to return air quality results from a specified area. You can see how the URL below has been adapted to include a geometry bounding box.

https://services.arcgis.com/cJ9YHowT8TU7DUyn/arcgis/rest/services/Air%20Now%20Current%20Monitor%20Data%20Public/FeatureServer/0/query?where=1%3D1&outFields=AQSID,Latitude,Longitude,PM25_AQI,LocalTimeString&geometry=-125.0,24.396308,-66.93457,49.384358&geometryType=esriGeometryEnvelope&inSR=4326&spatialRel=esriSpatialRelIntersects&outSR=4326&f=geojson

Let's leave the URL editing there for now, but do make sure to check the documentation of the API you're using to explore all of the parameters supported. Many will also supply UI-based custom API builders to help you to create the URL you need without needing to code.

Before we move on to analyzing this data, there are an extra couple of considerations to be aware of:

  1. This API is fully open, but many require you to set up an API and/or application key to access data. This can usually be easily appended to your URL with the code %APIKEY. If an API service is private and requires further authentication or access tokens, you should first use a HTTP Request component to obtain an authentication token by sending your credentials to the service. From here, you can extract the token and use it in a subsequent HTTP Request component to access the data, including the token in the appropriate header as specified by the service. Similarly, current the Import from URL supports CSV and GeoJSON formats - for other data formats HTTP Request should be used.

  2. Many APIs impose a limit to the number of features you can access, whether in a single call or within a time period. This limit is not imposed by CARTO, and if you require more features than the API allows you should contact the service provider.

Now, let's do something exciting with our data!


Step 2: Adding contextual information

Before creating a map, let's add some contextual information to our data to make it even more useful for our end users.

We'll do this with the below simple workflow, which we'll build on the one which we already started.

  1. Create local timestamp: as we start to build up a picture of air quality changes over time, we'll need to know when each recording was taken. It's important to know this in local time, as it's likely changes will be affected by time-sensitive patterns like commutes. For this, connect a Create Column component to your Import from URL. Call the field "local_time" and use the below formula for this calculation:

PARSE_TIMESTAMP('%m/%d/%Y %H:%M', (CONCAT(TRIM(SUBSTR(localtimestring, 5, 15)),'0')))
  1. USA_counties: let's make sure our users can find out which state and county each sensor can be found in. If you're working in the CARTO Data Warehouse, find the table usa_counties under Sources > Connection data > Organization data > demo tables. If not, you can locate and subscribe to this data via the Data Observatory and add this table through there.

  2. Join to counties with the following components:

    1. A Spatial Join to join counties to air quality sensors.

    2. An Edit Schema, selecting only the relevant fields; aqsid, pm25_aqi, geom, local_time, name_joined, state_name_joined. The original field types can be retained.

  3. Finally, use a Save as Table to commit your results.

Now we have a snapshot of this data from the time we ran this - now let's make some tweaks to the workflow so we can keep fetching the results every hour.


Step 3: Hourly updates

To prepare for our incoming hourly data, let's make the below tweaks to our workflow.

  1. First, give your page a refresh.

  2. Under Sources, navigate to wherever you saved your output in the previous step. Drag it onto the canvas, roughly below Edit schema.

  3. Delete the connection between Edit schema and Save as Table, instead connecting both Edit schema and your existing table to a new Union All component. Now, every time you run this workflow your table will have the new values appended to it.

  4. Connect this to a Remove Duplicates component. This will remove any duplicate rows, useful if the API isn't updated or if you need to do additional executions in between scheduled runs.

  5. Connect the Union All to the Save as Table component, ensuring the name is the same as the original table that you saved; this will overwrite the table every time it is run.

  6. Run the workflow! Don't worry, the Remove Duplicates component will remove any duplicated values.

  7. Now we can set up our workflow to run hourly. Select the clock to the left of Run (top-right of the window). Set the repeat frequency to every 1 hour - and save your changes.

  8. We also need to clear the workflow cache so that it generates fresh results each time - learn more about this here. This option can be found to the left of the clock icon we just used in Workflow settings. Simply disable the cache here.

Now your workflow will be set up to run hourly until you come back here to select Delete schedule. You should also come here to sync your scheduled workflow whenever you make changes to it.

While we're waiting for our table to be populated by the next hour of results... shall we build a map ready for it?


Step 3: Building a dashboard

  1. In your workflow, select the Save as Table component, and open the Map preview on the bottom of the screen - from here you can select Create Map to open a new CARTO Builder map with your data ready to go!

  2. Under Sources to the bottom-left of the screen, select Data freshness. Open the Data freshness window from here and set the data freshness to every 1 hour (see below).

  3. Open the map legend (bottom right of the screen) and click the three dots next to your newly generated Layer 1, and select Zoom to to fly to the extent of your layer.

Now let's build out our map:

  1. Rename the map (top left of the screen) "USA Air Quality"

  2. Rename the layer (3 dots next to the layer name - likely Layer 1) "Air Quality Index - PM 2.5"

  3. Style the layer:

    1. Radius: fixed, 3px.

    2. Fill color: pm25_aqi, using the color ramp Color Brewer Yellow-Orange-Red and the color scale Quantize. By choosing a pre-defined scale like Quantize or Quantile, your color ramp will auto-scale as new data is added.

    3. Stroke: white, px.

  4. Create an pop-up interaction for your layer by opening the Interactions panel (top left of the screen). Choose the style Light with highlighted 1st value, and then select which fields you'd like to appear in the pop-up (we're using AQSID, PM2.5_AQI, local_time, name_joined (i.e. county) and state_joined). You should also rename each field here so the names are easier to read.

Your map should be looking a little like this...

Now let's add some widgets to help our users understand the data.

  1. In the Widgets panel to (the left of Interactions), create a New Widget using your sensor locations layer.

  2. Change the widget type to Time Series, setting the below parameters:

    1. Name: PM2.5 AQI hourly changes. You can change this in the same way you change layer names.

    2. Time field: the widget builder should auto-detect Local time, but if your source has multiple time inputs, you would change it here.

    3. Operation: average.

    4. Aggregation column: PM25_AQI.

    5. Display options: 1 hour (if you leave your workflow running for a long time, you may wish to change this to days).

    6. Formatting: 2 decimal places

As we have data for multiple time zones on the map, you should already be able to see some temporal patterns and interaction with the time series widget.

Let's add a couple more widgets to tell more of a story with this data:

  1. Add a new Formula Widget called "Average PM2.5 AQI." This should use the average of the pm25_aqi column with 2 decimal place formatting.

  2. Add a new Category Widget called "PM2.5 AQI - top 5 counties." Set the operation to average, the column to name_joined and the aggregation column to pm25_aqi column. Again make sure you set the formatting to 2 decimal places.

Can you notice a problem with this? There are multiple counties in the US with the same names, so we need to do something to differentiate them or the widget will group them together.

  1. In the Sources window (bottom left of the screen), click on the three dots next to your source and select Query this table/Open SQL console (the display will depend on whether you have opened the console before.

  2. Between the * and FROM, type , CONCAT(name_joined, ', ', state_name_joined) AS county_label. So your entire console will look something like:

SELECT *, 
CONCAT(name_joined, ', ', state_name_joined) AS county_label 
FROM yourproject.yourdataset.yourtable
  1. Run the code, then head back to your category widget. Switch the SQL Query field from name_joined to county_label. Much better!

Altogether, your map should be looking something like...

Finally, if you'd like to share the results of your hard work, head to the Share options at the top of the screen!


What's next?

Why not explore some of our space-time statistics tools to help you draw more advanced conclusions from spatio-temporal data?

Last updated