Advanced Usage#

Dataset alignment#

eoforeststac.providers.align.DatasetAligner reprojects and resamples multiple datasets onto a common reference grid, enabling pixel-level comparison of products with different native CRS or resolution.

Basic alignment#

from eoforeststac.providers.zarr import ZarrProvider
from eoforeststac.providers.subset import subset
from eoforeststac.providers.align import DatasetAligner
import geopandas as gpd

provider = ZarrProvider(
    catalog_url="https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded-data/collections/public/catalog.json",
    endpoint_url="https://s3.gfz-potsdam.de",
    anon=True,
)

roi = gpd.read_file("DE-Hai.geojson")
geometry = roi.to_crs("EPSG:4326").geometry.union_all()

ds_cci = provider.open_dataset("CCI_BIOMASS", "6.0")
ds_cci = subset(ds_cci, geometry=geometry, time=("2020-01-01", "2020-12-31"))

ds_saatchi = provider.open_dataset("SAATCHI_BIOMASS", "2.0")
ds_saatchi = subset(ds_saatchi, geometry=geometry, time=("2020-01-01", "2020-12-31"))

aligner = DatasetAligner(
    target="CCI_BIOMASS",
    resampling={
        "CCI_BIOMASS": {"default": "average"},
        "SAATCHI_BIOMASS": {"default": "average"},
    },
)

aligned = aligner.align({
    "CCI_BIOMASS": ds_cci.sel(time="2020-01-01"),
    "SAATCHI_BIOMASS": ds_saatchi.sel(time="2020-01-01"),
})

The target= argument names the reference dataset. All other datasets are reprojected and resampled to match its CRS, resolution, and grid origin.

The aligned result has: - Identical CRS, resolution, and spatial extent for all variables. - Consistent dimension names (latitude, longitude). - Variable-specific resampling applied (e.g. "average" for continuous variables, "nearest" for categorical).

Aligning more than two products#

ds_liu = provider.open_dataset("LIU_BIOMASS", "1.0")
ds_liu = subset(ds_liu, geometry=geometry, time=("2020-01-01", "2020-12-31"))

aligned = aligner.align({
    "CCI_BIOMASS": ds_cci.sel(time="2020-01-01"),
    "SAATCHI_BIOMASS": ds_saatchi.sel(time="2020-01-01"),
    "LIU_BIOMASS": ds_liu.sel(time="2020-01-01"),
})

Writing data writers#

For data producers, the writers subpackage provides product-specific classes for ingesting raw data and writing analysis-ready Zarr stores. Each writer extends eoforeststac.writers.base.BaseZarrWriter and implements three methods: load_dataset(), process_dataset(), and write().

Example: GAMI Age-Class Fractions#

from eoforeststac.writers.gami_ageclass import GAMIAgeClassWriter

writer = GAMIAgeClassWriter(
    endpoint_url="https://s3.gfz-potsdam.de",
    aws_access_key_id="...",
    aws_secret_access_key="...",
)

for resolution in ["1deg", "0.5deg", "0.25deg", "0.1deg", "0.0833deg"]:
    writer.write(
        input_zarr=f"/data/GAMI/AgeClass_{resolution}",
        output_zarr=f"s3://dog.atlaseo-glm.eo-gridded-data/collections/GAMI_AGECLASS/GAMI_AGECLASS_{resolution}_v3.0.zarr",
        resolution=resolution,
        version="3.0",
    )

Example: CCI Biomass#

from eoforeststac.writers.CCI_biomass import CCIBiomassWriter

writer = CCIBiomassWriter(
    endpoint_url="https://s3.gfz-potsdam.de",
    aws_access_key_id="...",
    aws_secret_access_key="...",
)

writer.write(
    input_dir="/data/CCI_Biomass/v6.0",
    output_zarr="s3://dog.atlaseo-glm.eo-gridded-data/collections/CCI_BIOMASS/CCI_BIOMASS_v6.0.zarr",
    version="6.0",
)

Rebuilding the catalog#

After adding new products or versions, rebuild the STAC catalog JSON files and upload them:

from eoforeststac.catalog.root import build_catalog

catalog = build_catalog(versions={
    "CCI_BIOMASS": ["4.0", "5.0", "6.0"],
    "GAMI_AGECLASS": ["3.0"],
    # ...
})

catalog.normalize_and_save(root_href="s3://dog.atlaseo-glm.eo-gridded-data/collections/public/")