Optimizing your data for spatial analysis
Last updated
Was this helpful?
Last updated
Was this helpful?
It's not uncommon for geospatial datasets to be larger than their non-geospatial counterparts, and geospatial operations are sometimes slow or resource-demanding — but that's not a surprise: representing things and events on Earth and then computing their relationships is not an easy task.
With CARTO, you will unlock a way to do spatial analytics at scale, combining the huge computational power of your data warehouse with our expertise and tools, for millions or billions of data points. And we'll try to make it easy for you!.
In this guide we'll help you prepare your data so that it is optimized for spatial analysis with CARTO.
Having clean, optimized data at the source (your data warehouse) will:
Improve the performance of all analysis, apps, and visualizations made with CARTO
Reduce the computing costs associated in your data warehouse
Before we start diving into the specific optimizations and tricks available in your data warehouse, there are some typical data optimization patterns that apply to all data warehouses:
Optimization rule #1 — Can you reduce the volume of data?
While CARTO tries to automatically optimize the amount of data requested, having a huge source table is always a bigger challenge than having a smaller one.
Sometimes we find ourselves trying to use a huge table called raw_data
with 50TBs of data only to then realize: I actually don't need all the data in this table!
If that's your case and the raw data is static, then it's a good idea to materialize in a different (smaller) table the subset or aggregation that you need for your use case.
If that's your case and the raw data changes constantly, then it might be a good idea to build a data pipeline that refreshes your (smaller) table. You can .
Optimization rule #2 — Are you using the right spatial data type?
If you've read our , you already know CARTO supports multiple spatial data types.
Each data type has its own particularities when speaking about performance and optimization:
Points: points are great to represent specific locations but dealing with millions or billions of points is typically a sub-optimal way of solving spatial challenges. Consider aggregating your points into spatial indexes .
Polygons: polygons typically reflect meaningful areas in our analysis, but they quickly become expensive if using too many, too small, or too complex polygons. Consider simplifying your polygons or using a higher-level aggregation to reduce the number of polygons. Both of these operations can be achieved with .
Polygons are also known to become .
Generally it is a good idea to avoid overlapping geometries.
Lines: lines are an important way of representing linear features such as highways and rivers, and are key to network analyses like route optimization. Like polygons, they can quickly become expensive and should be simplified where possible.
Spatial Indexes: spatial indexes currently offer the best performance and costs for visualization and analysis purposes ✨ If you're less familiar with spatial indexes or need a refresher, we have prepared an specific .
The techniques to optimize your spatial data are slightly different for each data warehouse provider, so we've prepared specific guides for each of them. Check the ones that apply to you to learn more:
Make sure your data is clustered by your geometry or spatial index column.
If your data is points/polygons: make sure Search Optimization is enabled on your geometry column
If your data is based on spatial indexes: make sure it is clustered by your spatial index column.
If your data is points/polygons: make sure the SRID is set to EPSG:4326
If your data is based on spatial indexes: make sure you're using your spatial index column as the sort key.
Make sure your data uses your H3 column as the z-order.
Make sure your data is indexed by your geometry or spatial index column.
If your data is points/polygons: make sure the SRID is set to EPSG:3857
Make sure your data is clustered by your geometry or spatial index column.
Check our for more information.
As you've seen through this guide, we try our best to automatically optimize the performance and the costs of all analysis, apps, and visualizations made using CARTO. We also provide tools like or our to help you succeed.