STAC and Zarr Architecture#
EOForestSTAC combines a STAC metadata catalog with Zarr cloud-native storage. Understanding how these two layers interact is the key to understanding the rest of the package.
Overview#
catalog.json (root — pystac.Catalog)
├── biomass-carbon/ ← theme (pystac.Catalog)
│ ├── CCI_BIOMASS/ ← collection (pystac.Collection)
│ │ ├── CCI_BIOMASS_v6.0 ← item (pystac.Item)
│ │ │ └── zarr ← asset href → s3://.../CCI_BIOMASS_v6.0.zarr
│ │ └── CCI_BIOMASS_v7.0
│ ├── SAATCHI_BIOMASS/
│ └── LIU_BIOMASS/
├── disturbance-change/
│ ├── EFDA/
│ └── HANSEN_GFC/
├── structure-demography/
│ ├── POTAPOV_HEIGHT/
│ └── GAMI_AGECLASS/ ← multi-resolution: one item, 5 resolution assets
│ └── GAMI_AGECLASS_v3.0
│ ├── zarr_1deg
│ ├── zarr_0.5deg
│ ├── zarr_0.25deg
│ ├── zarr_0.1deg
│ └── zarr_0.0833deg
└── land-use/
├── POTAPOV_LCLUC/
└── RESTOR_LANDUSE/
Layer 1: STAC Catalog (metadata)#
The STAC catalog is a hierarchy of JSON files hosted on Ceph S3 at:
https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded-data/collections/public/catalog.json
It follows the SpatioTemporal Asset Catalog specification with four levels:
Level |
Role |
|---|---|
Root catalog |
Entry point; links to all theme sub-catalogs. |
Theme catalog |
Groups related products (e.g. biomass-carbon). Has no spatial extent of its own. |
Collection |
One product family (e.g. CCI_BIOMASS). Holds spatial/temporal extent, license, keywords, and links to all items. |
Item |
One version of a product (e.g. CCI_BIOMASS_v6.0). Holds asset links to the actual Zarr stores. |
For multi-resolution products (e.g. GAMI_AGECLASS), a single item holds multiple Zarr assets named zarr_1deg, zarr_0.25deg, etc.
Layer 2: Zarr Stores (data)#
All products are stored as Zarr archives on Ceph S3. Each Zarr store is a directory of chunks. Key design decisions:
HTTPS access: assets are exposed via HTTPS public URLs, readable by anyone without credentials.
Lazy streaming:
xr.open_zarr()reads only the metadata (coordinates, attributes) on open. Actual data chunks are fetched when.compute()is called.Consolidated metadata: each store has a
.zmetadatafile so that metadata can be read in a single request.CF-compliant attributes: all products follow CF conventions for dimension names (
latitude,longitude,time), units, and_FillValue.
How the providers use these layers#
eoforeststac.providers.discovery.DiscoveryProvider reads the STAC JSON hierarchy to build discovery tables — it never touches the Zarr data.
eoforeststac.providers.zarr.ZarrProvider reads the STAC item to find the asset href, then opens the Zarr store via fsspec. For https:// hrefs it uses fsspec.get_mapper(); for s3:// hrefs it uses the configured s3fs filesystem.
Asset key resolution#
For standard products, ZarrProvider.open_dataset() defaults to the "zarr" asset key. For multi-resolution products, pass resolution= to select the resolution-specific key:
# Single-resolution product (uses 'zarr' asset)
ds = provider.open_dataset("CCI_BIOMASS", "6.0")
# Multi-resolution product — 'zarr_0.25deg' asset
ds = provider.open_dataset("GAMI_AGECLASS", "3.0", resolution="0.25deg")
If the resolution= key does not exist, a ValueError is raised listing the available resolution keys.