# Data enrichment using the Data Observatory

<div align="left"><figure><img src="https://3015558743-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFEElAdsRIl9DzfMhbRlB%2Fuploads%2FhB2W9xXbzzo0kEuXMe3S%2Fintermediate%20banner.png?alt=media&#x26;token=4acd2cc7-c7e8-46c0-9669-6f6b73c030dd" alt="Intermediate difficulty banner" width="175"><figcaption></figcaption></figure></div>

In this guide you will learn how to perform data enrichment using Data Observatory data and the Analytics Toolbox. You can also access and run this guide using [this Google Colab notebook](https://colab.research.google.com/drive/1tpJlOlGIeAmQBQumXlr1nrTcPqbDyoEf).

Prefer a low-code approach? Check out our [Workflows templates](https://academy.carto.com/creating-workflows/workflow-templates) for [Data Enrichment](https://academy.carto.com/creating-workflows/workflow-templates/data-enrichment).&#x20;

## 1. Create a connection with BigQuery in the CARTO Workspace <a href="#id-1-create-a-connection-with-bigquery-in-the-carto-workspace" id="id-1-create-a-connection-with-bigquery-in-the-carto-workspace"></a>

1. Sign into your CARTO Workspace. If you still don’t have an account, you can sign-up [here](https://carto.com/signup) for a 14-day trial.
2. Navigate to the Connections section.
3. Create a new connection with BigQuery. You may choose the Service Account (SA) or the “Sign in with Google” options depending on where you are planning to run your queries:
   * If you are going to use the BigQuery console, please use the “Sign in with Google” option.
   * If you are going to use a BigQuery client instead (a Python notebook for instance), please use the SA option and make sure you use that same SA to authenticate in the client.

For more details, please refer to the [documentation](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/broken-reference).

## 2. Subscribe to the Data Observatory datasets <a href="#id-2-subscribe-to-the-data-observatory-datasets" id="id-2-subscribe-to-the-data-observatory-datasets"></a>

1. Navigate to the Data Observatory section of the CARTO Workspace.
2. Using the Spatial Data Catalog, subscribe to the following datasets, both available for free. You can find these datasets by using the search bar or the filter column on the left of the screen:

   * *Sociodemographics - United States of America (Census Block Group, 2018, 5yrs)* from American Community Survey.\*

   \*Please note that this data was retired from the in June 2025. You can find similar data products from providers like the ACS by searching for publicly-available demographics data in the Data Observatory.

   * *Nodes - United States of America (Latitude/Longitude)* from OpenStreetMap.

     <figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/bdaxLuH0nsyaPOHEIwgl/the_enrichment_guide_create_subscriptions.png" alt=""><figcaption></figcaption></figure>
3. Navigate to the Data Explorer and expand the Data Observatory section. Choose any of the your data subscriptions and click on the “Access in” button on the top right of the page. Copy the BigQuery project and dataset from any of the table locations that you see on the screen.

   <figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/sea7BK7By5nhh5tBGr8P/the_enrichment_guide_access_in.png" alt=""><figcaption></figcaption></figure>
4. Confirm that you can see all of your data subscriptions by running the command below, which makes use of the [`DATAOBS_SUBSCRIPTIONS`](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/broken-reference) procedure. **Please replace the BigQuery project and dataset with those you copied in the previous step.**

{% tabs %}
{% tab title="carto-un" %}

```sql
CALL `carto-un`.carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
```

{% endtab %}

{% tab title="carto-un-eu" %}

```sql
CALL `carto-un-eu`.carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
```

{% endtab %}

{% tab title="manual" %}

```sql
CALL carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
```

{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="carto-un" %}

```sql
CALL `carto-un`.carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
```

{% endtab %}

{% tab title="carto-un-eu" %}

```sql
CALL `carto-un-eu`.carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
```

{% endtab %}

{% tab title="manual" %}

```sql
CALL carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_lqe3zwgu','');
```

{% endtab %}
{% endtabs %}

<figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/6ttm973YKOwVkKzFzrGz/enrichment_guide_dataobs_subscriptions.png" alt=""><figcaption></figcaption></figure>

## 3. Choose variables for the enrichment <a href="#id-3-choose-variables-for-the-enrichment" id="id-3-choose-variables-for-the-enrichment"></a>

We can list all the variables (data columns) available in our Data Observatory subscriptions by running the following query, which makes use of the [`DATAOBS_SUBSCRIPTION_VARIABLES`](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/broken-reference) procedure. **Please remember to replace the BigQuery project and dataset with those you used in the previous command.**

{% tabs %}
{% tab title="carto-un" %}

```sql
CALL `carto-un`.carto.DATAOBS_SUBSCRIPTION_VARIABLES('carto-data.ac_lqe3zwgu','');
```

{% endtab %}

{% tab title="carto-un-eu" %}

```sql
CALL `carto-un-eu`.carto.DATAOBS_SUBSCRIPTION_VARIABLES('carto-data.ac_lqe3zwgu','');
```

{% endtab %}

{% tab title="manual" %}

```sql
CALL carto.DATAOBS_SUBSCRIPTION_VARIABLES('carto-data.ac_lqe3zwgu','');
```

{% endtab %}
{% endtabs %}

<figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/KVrPCY9Ea3uQyTHkWgbk/enrichment_guide_dataobs_variables.png" alt=""><figcaption></figcaption></figure>

In this particular example we are going to enrich our data with the following variables. Please note that these variables are uniquely identified by their `variable_slug`.

* `total_pop_3409f36f`, `median_age_e4b1c48c` and `income_per_capi_bfb55c80`: these variables are from the [ACS Sociodemographics dataset](https://carto.com/spatial-data-catalog/browser/dataset/acs_sociodemogr_95c726f9/) for the US, at Census Block Group level (2018). As we can see in the `variable_description` column, they represent the total population, their median age and their per capita income in the past 12 months, respectively.
* `shop_eede86ac`. This variable is from the [POIs dataset of OpenStreetMap](https://carto.com/spatial-data-catalog/browser/dataset/osm_nodes_74461e34) for the US. When the POI is a shop, this variable contains the specific shop category, e.g. “supermarket”. It is NULL otherwise.

## 4. Run the enrichment <a href="#id-4-run-the-enrichment" id="id-4-run-the-enrichment"></a>

We are going to enrich an H3 grid of resolution 6 of the city of New York with the four Data Observatory variables chosen in the previous step. The data table is publicly available at `cartobq.docs.nyc_boundary_h3z6` and it was created by leveraging the [H3 polyfill function](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/broken-reference) of the Analytics Toolbox, through the following query:

{% tabs %}
{% tab title="carto-un" %}

<pre class="language-sql"><code class="lang-sql">CREATE TABLE `cartobq.docs.nyc_boundary_h3z6` as
SELECT h3id FROM unnest(`carto-un`.carto.H3_POLYFILL(
<strong>    (SELECT urban_area_geom
</strong>FROM `bigquery-public-data.geo_us_boundaries.urban_areas`
WHERE name like "New York%"), 6)) h3id;
</code></pre>

{% endtab %}

{% tab title="carto-un-eu" %}

```sql
CREATE TABLE `cartobq.docs.nyc_boundary_h3z6` as
SELECT h3id FROM unnest(`carto-un-eu`.carto.H3_POLYFILL(
    (SELECT urban_area_geom
FROM `bigquery-public-data.geo_us_boundaries.urban_areas`
WHERE name like "New York%"), 6)) h3id;
```

{% endtab %}

{% tab title="manual" %}

```sql
CREATE TABLE `cartobq.docs.nyc_boundary_h3z6` as
SELECT h3id FROM unnest(carto.H3_POLYFILL(
    (SELECT urban_area_geom
FROM `bigquery-public-data.geo_us_boundaries.urban_areas`
WHERE name like "New York%"), 6)) h3id;
```

{% endtab %}
{% endtabs %}

The enrichment is performed using the [`DATAOBS_ENRICH_GRID`](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/broken-reference) procedure of the Analytics Toolbox. Please note that this particular procedure makes use of spatial indexes and does not require the input data to have a geometry column.

The following inputs are needed:

* The type of spatial index used, H3 in our case.
* The input query to be enriched.
* The name of the column containing valid H3 indexes.
* The list of variables to be used for the enrichment and their aggregation method. As explained earlier, these variables are identified using their `variable_slug`. For more information about the aggregation methods, please refer to the [documentation](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/broken-reference).
* Name of the output table where the result of the enrichment will be stored.
* Location of your Data Observatory subscriptions. This is the same `project.dataset` we used to run the `DATAOBS_SUBSCRIPTIONS` and `DATAOBS_SUBSCRIPTION_VARIABLES` in previous steps of this guide.

{% tabs %}
{% tab title="carto-un" %}

```sql
CALL `carto-un`.carto.DATAOBS_ENRICH_GRID
('h3',
R'''
SELECT * from `cartobq.docs.nyc_boundary_h3z6`
''',
'h3id',
[('total_pop_3409f36f','sum'),('median_age_e4b1c48c','avg'),('income_per_capi_bfb55c80','avg'),('shop_eede86ac','count')],
NULL,
['cartobq.docs.nyc_boundary_h3z6_enriched'],
'carto-data.ac_lqe3zwgu');
```

{% endtab %}

{% tab title="carto-un-eu" %}

```sql
CALL `carto-un-eu`.carto.DATAOBS_ENRICH_GRID
('h3',
R'''
SELECT * from `cartobq.docs.nyc_boundary_h3z6`
''',
'h3id',
[('total_pop_3409f36f','sum'),('median_age_e4b1c48c','avg'),('income_per_capi_bfb55c80','avg'),('shop_eede86ac','count')],
NULL,
['cartobq.docs.nyc_boundary_h3z6_enriched'],
'carto-data.ac_lqe3zwgu');
```

{% endtab %}

{% tab title="manual" %}

```sql
CALL carto.DATAOBS_ENRICH_GRID
('h3',
R'''
SELECT * from `cartobq.docs.nyc_boundary_h3z6`
''',
'h3id',
[('total_pop_3409f36f','sum'),('median_age_e4b1c48c','avg'),('income_per_capi_bfb55c80','avg'),('shop_eede86ac','count')],
NULL,
['cartobq.docs.nyc_boundary_h3z6_enriched'],
'carto-data.ac_lqe3zwgu');
```

{% endtab %}
{% endtabs %}

<figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/BbjBh2JoBeyMVt4tnwzc/enrichment_guide_result.png" alt=""><figcaption></figcaption></figure>

## 5. Analyze the enrichment result <a href="#id-5-analyze-the-enrichment-result" id="id-5-analyze-the-enrichment-result"></a>

The table resulting from running the previous query, publicly available at `cartobq.docs.nyc_boundary_h3z6_enriched`, will include all the columns of the input query plus four additional columns, containing the value of each enrichment variable in each H3 cell. As shown below, the enrichment result can be analyzed with the help of a map and a set of interactive widgets created using Builder, our map making tool available from the CARTO Workspace.

{% embed url="<https://clausa.app.carto.com/map/92c0a1ed-31cf-47cb-acd8-9485c9f21194>" %}

To get started creating maps, we recommend the following resources from the documentation:

* [Guide to create your first map](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/broken-reference).
* [Guide to add widgets to a map](https://academy.carto.com/advanced-spatial-analytics/spatial-analytics-for-bigquery/step-by-step-tutorials/broken-reference).
* [Step-by-step tutorial](https://academy.carto.com/building-interactive-maps/data-visualization/build-a-categories-and-bubbles-visualization) to create a category and bubbles visualization, leveraging different map styles and widgets.
