# Data Preparation

### Joining different datasets by a common column&#x20;

<table data-full-width="false"><thead><tr><th align="center">CARTO DW</th><th width="154" align="center">BigQuery</th><th align="center">Snowflake</th><th align="center">Redshift</th><th align="center">PostgreSQL</th><th data-hidden>Snowflake</th><th data-hidden>Redshift</th><th data-hidden>PostgreSQL</th><th data-hidden>CARTO DW</th></tr></thead><tbody><tr><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td><td></td><td></td><td></td></tr></tbody></table>

This example demonstrates how to use Workflows to join two tables based on a common ID on both tables.&#x20;

[**Download example**](https://storage.googleapis.com/carto-workflows-examples/files/join.sql)

<div align="left"><figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/2m0UKtXhcakblYsUmcKl/Screenshot%202023-10-17%20at%2017.18.04.png" alt="" width="563"><figcaption></figcaption></figure></div>

### Join two datasets and group by a property

<table data-full-width="false"><thead><tr><th align="center">CARTO DW</th><th width="154" align="center">BigQuery</th><th align="center">Snowflake</th><th align="center">Redshift</th><th align="center">PostgreSQL</th><th data-hidden>Snowflake</th><th data-hidden>Redshift</th><th data-hidden>PostgreSQL</th><th data-hidden>CARTO DW</th></tr></thead><tbody><tr><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td><td></td><td></td><td></td></tr></tbody></table>

This example shows how to use Workflows to join two tables together and then group by a specific property, producing aggregated data coming from both sources.

[**Download example**](https://storage.googleapis.com/carto-workflows-examples/files/join_and_group_by.sql)

<div align="left"><figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/7QCoqfiRPEjESqribPwz/Screenshot%202023-10-17%20at%2017.37.20.png" alt="" width="563"><figcaption></figcaption></figure></div>

### Union of two data sources with same schema&#x20;

<table data-full-width="false"><thead><tr><th align="center">CARTO DW</th><th width="154" align="center">BigQuery</th><th align="center">Snowflake</th><th align="center">Redshift</th><th align="center">PostgreSQL</th><th data-hidden>Snowflake</th><th data-hidden>Redshift</th><th data-hidden>PostgreSQL</th><th data-hidden>CARTO DW</th></tr></thead><tbody><tr><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td><td></td><td></td><td></td></tr></tbody></table>

This example demonstrates how to use Workflows to generate a table that contains all the rows from two different sources with the same schema.

[**Download example**](https://storage.googleapis.com/carto-workflows-examples/files/union.sql)

<div align="left"><figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/3nvgOWfvXwRs4O7eXeq2/Screenshot%202023-10-20%20at%2016.32.45.png" alt="" width="342"><figcaption></figcaption></figure></div>

### Filter rows using a custom geography&#x20;

<table data-full-width="false"><thead><tr><th align="center">CARTO DW</th><th width="154" align="center">BigQuery</th><th align="center">Snowflake</th><th align="center">Redshift</th><th align="center">PostgreSQL</th><th data-hidden>Snowflake</th><th data-hidden>Redshift</th><th data-hidden>PostgreSQL</th><th data-hidden>CARTO DW</th></tr></thead><tbody><tr><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td><td></td><td></td><td></td></tr></tbody></table>

This example demonstrates how to use Workflows to filter a data source using a custom geography input.

[**Download example**](https://storage.googleapis.com/carto-workflows-examples/files/custom_geo_filter.sql)

<div align="left"><figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/zDvJ39SN5h2cnOFfWxx7/Screenshot%202023-10-20%20at%2016.52.18.png" alt="" width="563"><figcaption></figcaption></figure></div>

### Generate a calculated column from a multi-column formula&#x20;

<table data-full-width="false"><thead><tr><th align="center">CARTO DW</th><th width="154" align="center">BigQuery</th><th align="center">Snowflake</th><th align="center">Redshift</th><th align="center">PostgreSQL</th><th data-hidden>Snowflake</th><th data-hidden>Redshift</th><th data-hidden>PostgreSQL</th><th data-hidden>CARTO DW</th></tr></thead><tbody><tr><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td><td></td><td></td><td></td></tr></tbody></table>

This example demonstrates how to use Workflows to generate a new column using a formula that involves different columns in the calculation.

[**Download example**](https://storage.googleapis.com/carto-workflows-examples/files/multi-column_formula.sql)

<div align="left"><figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/H4LGHkkmTDPV0qDk5WTa/Screenshot%202023-10-17%20at%2018.03.27.png" alt="" width="375"><figcaption></figcaption></figure></div>

### Normalize a variable to an index between 0 and 1

<table data-full-width="false"><thead><tr><th align="center">CARTO DW</th><th width="154" align="center">BigQuery</th><th align="center">Snowflake</th><th align="center">Redshift</th><th align="center">PostgreSQL</th><th data-hidden>Snowflake</th><th data-hidden>Redshift</th><th data-hidden>PostgreSQL</th><th data-hidden>CARTO DW</th></tr></thead><tbody><tr><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td><td></td><td></td><td></td></tr></tbody></table>

This example demonstrates how to use Workflows to obtain a normalized index from a column in your dataset.

[**Download example**](https://storage.googleapis.com/carto-workflows-examples/files/normalize.sql)

<div align="left"><figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/WbLkq95kVWNJbIGMhJz1/Screenshot%202023-10-17%20at%2018.10.59.png" alt="" width="313"><figcaption></figcaption></figure></div>

### Rank and limit a table&#x20;

<table data-full-width="false"><thead><tr><th align="center">CARTO DW</th><th width="154" align="center">BigQuery</th><th align="center">Snowflake</th><th align="center">Redshift</th><th align="center">PostgreSQL</th><th data-hidden>Snowflake</th><th data-hidden>Redshift</th><th data-hidden>PostgreSQL</th><th data-hidden>CARTO DW</th></tr></thead><tbody><tr><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td><td></td><td></td><td></td></tr></tbody></table>

This example demonstrates how to use Workflows to sort a table by a specific property, and only keep a certain number of rows.

[**Download example**](https://storage.googleapis.com/carto-workflows-examples/files/rank_and_limit.sql)

<div align="left"><figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/f8FU1lzYf9vspRjTN8h1/Screenshot%202023-10-17%20at%2016.52.03.png" alt="" width="375"><figcaption></figcaption></figure></div>

### Filter columns&#x20;

<table data-full-width="false"><thead><tr><th align="center">CARTO DW</th><th width="154" align="center">BigQuery</th><th align="center">Snowflake</th><th align="center">Redshift</th><th align="center">PostgreSQL</th><th data-hidden>Snowflake</th><th data-hidden>Redshift</th><th data-hidden>PostgreSQL</th><th data-hidden>CARTO DW</th></tr></thead><tbody><tr><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td align="center"><span data-gb-custom-inline data-tag="emoji" data-code="2705">✅</span></td><td></td><td></td><td></td><td></td></tr></tbody></table>

This example demonstrates how to use Workflows to reduce a dataset to a smaller number of columns required for a specific analysis..

[**Download example**](https://storage.googleapis.com/carto-workflows-examples/files/filter_columns.sql)

<div align="left"><figure><img src="https://content.gitbook.com/content/FEElAdsRIl9DzfMhbRlB/blobs/b1KaOznqQScfHuwPL6DO/Screenshot%202023-10-17%20at%2017.53.10.png" alt="" width="362"><figcaption></figcaption></figure></div>
