Overview: Why EOForestSTAC?#
EOForestSTAC is a Python package built to simplify access to global forest Earth Observation datasets hosted on cloud-native object storage. It exposes a curated STAC catalog of analysis-ready Zarr archives and provides a clean Python interface for discovery, loading, subsetting, and combining multiple products.
The motivation behind EOForestSTAC#
Working with large-scale forest EO datasets presents several challenges:
Fragmented products: biomass, disturbance, canopy height, forest age, and land-use datasets are published by different groups in different formats, often as multi-gigabyte GeoTIFFs.
Version proliferation: most products have multiple versions and spatial resolutions; tracking what is available and what is current requires manual bookkeeping.
Download overhead: traditional workflows require downloading complete datasets before any analysis, even when only a small region and time range are needed.
Grid inconsistency: combining products from different sources requires reprojection and resampling to a common grid.
EOForestSTAC was designed to address all of these by combining a STAC metadata catalog with Zarr cloud-native storage, enabling lazy streaming access with no local downloads.
What EOForestSTAC enables#
Zero-download access: datasets are streamed from Ceph S3 directly into an
xarray.Dataset; only the requested spatial and temporal subset is transferred.Unified catalog: all products, versions, and spatial resolutions are registered in a single STAC catalog browsable via Python or the interactive STAC Browser.
Multi-resolution support: products like GAMI age-class fractions are available at five spatial resolutions; the
resolution=parameter selects the right asset automatically.Spatial subsetting:
subset()clips any dataset to a GeoDataFrame geometry with automatic CRS reprojection.Multi-product alignment:
DatasetAlignerreprojects and resamples multiple datasets onto a common grid in one call.Lazy evaluation: all operations are backed by Dask and xarray; computation is deferred until
.compute()is called.
Catalog structure#
The catalog is organized as a three-level hierarchy:
catalog.json (root)
├── biomass-carbon/ ← theme catalog
│ ├── CCI_BIOMASS/ ← collection (one per product)
│ │ ├── CCI_BIOMASS_v6.0 ← item (one per version)
│ │ └── CCI_BIOMASS_v7.0
│ ├── SAATCHI_BIOMASS/
│ └── LIU_BIOMASS/
├── disturbance-change/
│ ├── EFDA/
│ ├── HANSEN_GFC/
│ └── ...
├── structure-demography/
│ ├── POTAPOV_HEIGHT/
│ └── GAMI_AGECLASS/ ← multi-resolution: one item, 5 resolution assets
└── ...
Each item holds one or more Zarr assets (one per spatial resolution for multi-resolution products). The ZarrProvider and DiscoveryProvider classes navigate this hierarchy transparently.
Core components of EOForestSTAC#
eoforeststac.providers.discovery.DiscoveryProvider: navigates the STAC catalog hierarchy to list themes, collections, versions, and asset keys.eoforeststac.providers.zarr.ZarrProvider: opens a specific dataset version as anxarray.Datasetstreamed from Ceph S3.eoforeststac.providers.subset.subset: clips a dataset to a spatial geometry and optional time range.eoforeststac.providers.align.DatasetAligner: reprojects and resamples multiple datasets onto a shared reference grid.
Available product themes#
Theme |
Products (examples) |
|---|---|
|
CCI Biomass, Saatchi Biomass, Liu Biomass, GAMI |
|
EFDA (forest disturbances), Hansen GFC, RADD Europe, JRC TMF, JRC GFC, Robinson Carbon Recovery |
|
Potapov Canopy Height, GAMI Age-Class Fractions (multi-resolution) |
|
Potapov LCLUC, RESTOR Land Use, ForestPaths Genus |
Goals and aspirations#
EOForestSTAC’s primary objective is to provide a reproducible, interoperable platform for forest EO research, including:
Multi-source synthesis: load biomass, disturbance, canopy height, and forest age data onto a common grid for integrated analyses.
Temporal change detection: access all available versions of a product and compare them with
.sel(time=...).Carbon cycle studies: combine multiple products to quantify forest carbon stocks, changes, and uncertainty.
—
Acknowledgments#
The development of EOForestSTAC was supported by the European Union through the FORWARDS project.
The author thanks Konstantin Ntokas (Brockmann Consult) for technical assistance, and the EO-LINCS project for valuable feedback on data structure and improvements to the STAC catalog design.
—
A project from GFZ Potsdam#
EOForestSTAC was developed by Simon Besnard from the Global Land Monitoring Group at the Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences. The project applies cloud-native data principles established in the gedidb and icesat2db packages to global gridded forest EO products. It is open-source (EUPL-1.2) and welcomes contributions from the research community.
—