LogoLogo
HomeDocumentationLoginTry for free
  • CARTO Academy
  • Working with geospatial data
    • Geospatial data: the basics
      • What is location data?
      • Types of location data
      • Changing between types of geographical support
    • Optimizing your data for spatial analysis
    • Introduction to Spatial Indexes
      • Spatial Index support in CARTO
      • Create or enrich an index
      • Work with unique Spatial Index properties
      • Scaling common geoprocessing tasks with Spatial Indexes
      • Using Spatial Indexes for analysis
        • Calculating traffic accident rates
        • Which cell phone towers serve the most people?
    • The modern geospatial analysis stack
      • Spatial data management and analytics with CARTO QGIS Plugin
      • Using data from a REST API for real-time updates
  • Building interactive maps
    • Introduction to CARTO Builder
    • Data sources & map layers
    • Widgets & SQL Parameters
    • AI Agents
    • Data visualization
      • Build a dashboard with styled point locations
      • Style qualitative data using hex color codes
      • Create an animated visualization with time series
      • Visualize administrative regions by defined zoom levels
      • Build a dashboard to understand historic weather events
      • Customize your visualization with tailored-made basemaps
      • Visualize static geometries with attributes varying over time
      • Mapping the precipitation impact of Hurricane Milton with raster data
    • Data analysis
      • Filtering multiple data sources simultaneously with SQL Parameters
      • Generate a dynamic index based on user-defined weighted variables
      • Create a dashboard with user-defined analysis using SQL Parameters
      • Analyzing multiple drive-time catchment areas dynamically
      • Extract insights from your maps with AI Agents
    • Sharing and collaborating
      • Dynamically control your maps using URL parameters
      • Embedding maps in BI platforms
    • Solving geospatial use-cases
      • Build a store performance monitoring dashboard for retail stores in the USA
      • Analyzing Airbnb ratings in Los Angeles
      • Assessing the damages of La Palma Volcano
    • CARTO Map Gallery
  • Creating workflows
    • Introduction to CARTO Workflows
    • Step-by-step tutorials
      • Creating a composite score for fire risk
      • Spatial Scoring: Measuring merchant attractiveness and performance
      • Using crime data & spatial analysis to assess home insurance risk
      • Identify the best billboards and stores for a multi-channel product launch campaign
      • Estimate the population covered by LTE cells
      • A no-code approach to optimizing OOH advertising locations
      • Optimizing site selection for EV charging stations
      • How to optimize location planning for wind turbines
      • Calculate population living around top retail locations
      • Identifying customers potentially affected by an active fire in California
      • Finding stores in areas with weather risks
      • How to run scalable routing analysis the easy way
      • Geomarketing techniques for targeting sportswear consumers
      • How to use GenAI to optimize your spatial analysis
      • Analyzing origin and destination patterns
      • Understanding accident hotspots
      • Real-Time Flood Claims Analysis
      • Train a classification model to estimate customer churn
      • Space-time anomaly detection for real-time portfolio management
      • Identify buildings in areas with a deficit of cell network antennas
    • Workflow templates
      • Data Preparation
      • Data Enrichment
      • Spatial Indexes
      • Spatial Analysis
      • Generating new spatial data
      • Statistics
      • Retail and CPG
      • Telco
      • Insurance
      • Out Of Home Advertising
      • BigQuery ML
      • Snowflake ML
  • Advanced spatial analytics
    • Introduction to the Analytics Toolbox
    • Spatial Analytics for BigQuery
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Spacetime hotspot classification: Understanding collision patterns
        • Time series clustering: Identifying areas with similar traffic accident patterns
        • Detecting space-time anomalous regions to improve real estate portfolio management (quick start)
        • Detecting space-time anomalous regions to improve real estate portfolio management
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Analyzing signal coverage with line-of-sight calculation and path loss estimation
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Find similar locations based on their trade areas
        • Calculating market penetration in CPG with merchant universe matching
        • Measuring merchant attractiveness and performance in CPG with spatial scores
        • Segmenting CPG merchants using trade areas characteristics
        • Store cannibalization: quantifying the effect of opening new stores on your existing network
        • Find Twin Areas of top-performing stores
        • Opening a new Pizza Hut location in Honolulu
        • An H3 grid of Starbucks locations and simple cannibalization analysis
        • Data enrichment using the Data Observatory
        • New police stations based on Chicago crime location clusters
        • Interpolating elevation along a road using kriging
        • Analyzing weather stations coverage using a Voronoi diagram
        • A NYC subway connection graph using Delaunay triangulation
        • Computing US airport connections and route interpolations
        • Identifying earthquake-prone areas in the state of California
        • Bikeshare stations within a San Francisco buffer
        • Census areas in the UK within tiles of multiple resolutions
        • Creating simple tilesets
        • Creating spatial index tilesets
        • Creating aggregation tilesets
        • Using raster and vector data to calculate total rooftop PV potential in the US
        • Using the routing module
      • About Analytics Toolbox regions
    • Spatial Analytics for Snowflake
      • Step-by-step tutorials
        • How to create a composite score with your spatial data
        • Space-time hotspot analysis: Identifying traffic accident hotspots
        • Computing the spatial autocorrelation of POIs locations in Berlin
        • Identifying amenity hotspots in Stockholm
        • Applying GWR to understand Airbnb listings prices
        • Opening a new Pizza Hut location in Honolulu
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
        • A Quadkey grid of stores locations and simple cannibalization analysis
        • Minkowski distance to perform cannibalization analysis
        • Computing US airport connections and route interpolations
        • New supplier offices based on store locations clusters
        • Analyzing store location coverage using a Voronoi diagram
        • Enrichment of catchment areas for store characterization
        • Data enrichment using the Data Observatory
    • Spatial Analytics for Redshift
      • Step-by-step tutorials
        • Generating trade areas based on drive/walk-time isolines
        • Geocoding your address data
        • Creating spatial index tilesets
Powered by GitBook
On this page
  • 1. Create a connection with BigQuery in the CARTO Workspace
  • 2. Subscribe to the Data Observatory datasets
  • 3. Choose variables for the enrichment
  • 4. Run the enrichment
  • 5. Analyze the enrichment result

Was this helpful?

Export as PDF
  1. Advanced spatial analytics
  2. Spatial Analytics for BigQuery
  3. Step-by-step tutorials

Data enrichment using the Data Observatory

Last updated 12 months ago

Was this helpful?

In this guide you will learn how to perform data enrichment using Data Observatory data and the Analytics Toolbox. You can also access and run this guide using .

Prefer a low-code approach? Check out our for .

1. Create a connection with BigQuery in the CARTO Workspace

  1. Navigate to the Connections section.

  2. Create a new connection with BigQuery. You may choose the Service Account (SA) or the “Sign in with Google” options depending on where you are planning to run your queries:

    • If you are going to use the BigQuery console, please use the “Sign in with Google” option.

    • If you are going to use a BigQuery client instead (a Python notebook for instance), please use the SA option and make sure you use that same SA to authenticate in the client.

For more details, please refer to the documentation.

2. Subscribe to the Data Observatory datasets

  1. Navigate to the Data Observatory section of the CARTO Workspace.

  2. Using the Spatial Data Catalog, subscribe to the following datasets, both available for free. You can find these datasets by using the search bar or the filter column on the left of the screen:

    • Sociodemographics - United States of America (Census Block Group, 2018, 5yrs) from American Community Survey.

    • Nodes - United States of America (Latitude/Longitude) from OpenStreetMap.

  3. Navigate to the Data Explorer and expand the Data Observatory section. Choose any of the your data subscriptions and click on the “Access in” button on the top right of the page. Copy the BigQuery project and dataset from any of the table locations that you see on the screen.

  4. Confirm that you can see all of your data subscriptions by running the command below, which makes use of the DATAOBS_SUBSCRIPTIONS procedure. Please replace the BigQuery project and dataset with those you copied in the previous step.

CALL `carto-un`.carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
CALL `carto-un-eu`.carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
CALL carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
CALL `carto-un`.carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
CALL `carto-un-eu`.carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
CALL carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');

3. Choose variables for the enrichment

We can list all the variables (data columns) available in our Data Observatory subscriptions by running the following query, which makes use of the DATAOBS_SUBSCRIPTION_VARIABLES procedure. Please remember to replace the BigQuery project and dataset with those you used in the previous command.

CALL `carto-un`.carto.DATAOBS_SUBSCRIPTION_VARIABLES('carto-data.ac_lqe3zwgu','');
CALL `carto-un-eu`.carto.DATAOBS_SUBSCRIPTION_VARIABLES('carto-data.ac_lqe3zwgu','');
CALL carto.DATAOBS_SUBSCRIPTION_VARIABLES('carto-data.ac_lqe3zwgu','');

In this particular example we are going to enrich our data with the following variables. Please note that these variables are uniquely identified by their variable_slug.

4. Run the enrichment

We are going to enrich an H3 grid of resolution 6 of the city of New York with the four Data Observatory variables chosen in the previous step. The data table is publicly available at cartobq.docs.nyc_boundary_h3z6 and it was created by leveraging the H3 polyfill function of the Analytics Toolbox, through the following query:

CREATE TABLE `cartobq.docs.nyc_boundary_h3z6` as
SELECT h3id FROM unnest(`carto-un`.carto.H3_POLYFILL(
    (SELECT urban_area_geom
FROM `bigquery-public-data.geo_us_boundaries.urban_areas`
WHERE name like "New York%"), 6)) h3id;
CREATE TABLE `cartobq.docs.nyc_boundary_h3z6` as
SELECT h3id FROM unnest(`carto-un-eu`.carto.H3_POLYFILL(
    (SELECT urban_area_geom
FROM `bigquery-public-data.geo_us_boundaries.urban_areas`
WHERE name like "New York%"), 6)) h3id;
CREATE TABLE `cartobq.docs.nyc_boundary_h3z6` as
SELECT h3id FROM unnest(carto.H3_POLYFILL(
    (SELECT urban_area_geom
FROM `bigquery-public-data.geo_us_boundaries.urban_areas`
WHERE name like "New York%"), 6)) h3id;

The enrichment is performed using the DATAOBS_ENRICH_GRID procedure of the Analytics Toolbox. Please note that this particular procedure makes use of spatial indexes and does not require the input data to have a geometry column.

The following inputs are needed:

  • The type of spatial index used, H3 in our case.

  • The input query to be enriched.

  • The name of the column containing valid H3 indexes.

  • The list of variables to be used for the enrichment and their aggregation method. As explained earlier, these variables are identified using their variable_slug. For more information about the aggregation methods, please refer to the documentation.

  • Name of the output table where the result of the enrichment will be stored.

  • Location of your Data Observatory subscriptions. This is the same project.dataset we used to run the DATAOBS_SUBSCRIPTIONS and DATAOBS_SUBSCRIPTION_VARIABLES in previous steps of this guide.

CALL `carto-un`.carto.DATAOBS_ENRICH_GRID
('h3',
R'''
SELECT * from `cartobq.docs.nyc_boundary_h3z6`
''',
'h3id',
[('total_pop_3409f36f','sum'),('median_age_e4b1c48c','avg'),('income_per_capi_bfb55c80','avg'),('shop_eede86ac','count')],
NULL,
['cartobq.docs.nyc_boundary_h3z6_enriched'],
'carto-data.ac_lqe3zwgu');
CALL `carto-un-eu`.carto.DATAOBS_ENRICH_GRID
('h3',
R'''
SELECT * from `cartobq.docs.nyc_boundary_h3z6`
''',
'h3id',
[('total_pop_3409f36f','sum'),('median_age_e4b1c48c','avg'),('income_per_capi_bfb55c80','avg'),('shop_eede86ac','count')],
NULL,
['cartobq.docs.nyc_boundary_h3z6_enriched'],
'carto-data.ac_lqe3zwgu');
CALL carto.DATAOBS_ENRICH_GRID
('h3',
R'''
SELECT * from `cartobq.docs.nyc_boundary_h3z6`
''',
'h3id',
[('total_pop_3409f36f','sum'),('median_age_e4b1c48c','avg'),('income_per_capi_bfb55c80','avg'),('shop_eede86ac','count')],
NULL,
['cartobq.docs.nyc_boundary_h3z6_enriched'],
'carto-data.ac_lqe3zwgu');

5. Analyze the enrichment result

The table resulting from running the previous query, publicly available at cartobq.docs.nyc_boundary_h3z6_enriched, will include all the columns of the input query plus four additional columns, containing the value of each enrichment variable in each H3 cell. As shown below, the enrichment result can be analyzed with the help of a map and a set of interactive widgets created using Builder, our map making tool available from the CARTO Workspace.

To get started creating maps, we recommend the following resources from the documentation:

  • Guide to create your first map.

  • Guide to add widgets to a map.

Sign into your CARTO Workspace. If you still don’t have an account, you can sign-up for a 14-day trial.

total_pop_3409f36f, median_age_e4b1c48c and income_per_capi_bfb55c80: these variables are from the for the US, at Census Block Group level (2018). As we can see in the variable_description column, they represent the total population, their median age and their per capita income in the past 12 months, respectively.

shop_eede86ac. This variable is from the for the US. When the POI is a shop, this variable contains the specific shop category, e.g. “supermarket”. It is NULL otherwise.

to create a category and bubbles visualization, leveraging different map styles and widgets.

here
ACS Sociodemographics dataset
POIs dataset of OpenStreetMap
Step-by-step tutorial
this Google Colab notebook
Workflows templates
Data Enrichment
Intermediate difficulty banner