Skip to content

Make rio-cogeo COG validator a required CI gate (#2302)#2304

Merged
brendancol merged 3 commits into
xarray-contrib:mainfrom
brendancol:worktree-agent-a11151265da216d8c
May 22, 2026
Merged

Make rio-cogeo COG validator a required CI gate (#2302)#2304
brendancol merged 3 commits into
xarray-contrib:mainfrom
brendancol:worktree-agent-a11151265da216d8c

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2302.

Part of the #2286 production-readiness wave. The compliance suite has had an optional rio-cogeo / GDAL validate_cloud_optimized_geotiff check for a while, but it silently skipped when the dependency was missing. That is fine on contributor laptops; it is not a useful guarantee on CI. This PR makes the validator a required gate.

What changed

  • New workflow .github/workflows/test-cog-validator.yml. Linux-only, installs rio-cogeo and GDAL Python bindings from conda-forge, sets XRSPATIAL_REQUIRE_COG_VALIDATOR=1, and runs pytest xrspatial/geotiff/tests/test_cog_writer_compliance.py xrspatial/geotiff/tests/test_cog_parity_2286.py -x. Pinned to Python 3.12 because the GDAL stack on 3.14 from conda-forge is not yet uniformly available.
  • xrspatial/geotiff/tests/test_cog_writer_compliance.py:
    • _try_cog_validate now checks XRSPATIAL_REQUIRE_COG_VALIDATOR. Env unset and validators missing keeps the old soft-skip path. Env set and validators missing fails the test loudly.
    • Added four tests covering the strict-mode and soft-skip paths plus the truthy / non-truthy env value mapping.
  • docs/source/reference/geotiff.rst: one-paragraph note describing the new CI gate.

Backward compatibility

The env var is unset by default. Contributor machines without rio-cogeo / GDAL installed still pass the suite -- only CI sets the var.

Test plan

  • pytest xrspatial/geotiff/tests/test_cog_writer_compliance.py passes locally (43 passed, 1 skipped where the validator dep is genuinely missing).
  • XRSPATIAL_REQUIRE_COG_VALIDATOR=1 pytest xrspatial/geotiff/tests/test_cog_writer_compliance.py::test_external_cog_validator fails locally with the expected loud message when rio-cogeo / GDAL are absent.
  • New pytest-cog-validator job runs and is green on this PR.

- New `pytest-cog-validator` workflow installs rio-cogeo + GDAL from
  conda-forge on Linux, sets `XRSPATIAL_REQUIRE_COG_VALIDATOR=1`, and
  runs the compliance + parity suite.
- `_try_cog_validate` now fails (not skips) when the env var is set
  and neither validator is importable, so a misconfigured install
  cannot quietly pass the gate. Env unset keeps the soft-skip path
  for contributor laptops.
- Docs note under `geotiff.rst` describes the new CI gate.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 22, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Make rio-cogeo COG validator a required CI gate (#2302)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • .github/workflows/test-cog-validator.yml: no concurrency: block. A rapid sequence of pushes queues up a fresh micromamba env build per push. Add:
    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}
      cancel-in-progress: true
  • .github/workflows/test-cog-validator.yml: add permissions: contents: read at the top level. Nothing in this job writes, and the missing block makes it inherit the repo default.
  • xrspatial/geotiff/tests/test_cog_writer_compliance.py:573-579 (and :611-617): the _blocked_import stub matches name == "rio_cogeo.cogeo" without checking fromlist, but the osgeo branch right below it does check fromlist. Make the two branches use the same shape. Either tighten the rio_cogeo branch with fromlist and "cog_validate" in tuple(fromlist), or drop the fromlist check from the osgeo branch.
  • .github/workflows/test-cog-validator.yml:78-85: no timeout-minutes on the pytest step. If a future regression hangs to_geotiff or the validator, the job holds the runner for the full 6h default. timeout-minutes: 20 is plenty for this subset.

Nits

  • docs/source/reference/geotiff.rst:32: the second hyperlink points at gdal.org/programs/gdaladdo.html#gdaladdo, which is the gdaladdo page, not the validate_cloud_optimized_geotiff sample the sentence is actually about. Swap to the GDAL cookbook / source URL, or drop the link and leave validate_cloud_optimized_geotiff as plain text.
  • xrspatial/geotiff/tests/test_cog_writer_compliance.py:182-191: _require_validator_env says "set truthy" without spelling out which values count. The non-truthy test already enumerates 1, true, yes, on; lifting that list into the docstring would make the helper self-documenting.
  • .github/workflows/test-cog-validator.yml: matrix has one python entry (['3.12']). If this is a permanent pin, drop the matrix and just use runs-on: ubuntu-latest. If 3.13 / 3.14 are meant to land here later, say so in a comment.
  • pytest.raises(pytest.fail.Exception, ...) works on pytest >= 7 (which is what this repo pins), but pytest.fail.Exception is an alias for _pytest.outcomes.Failed. Worth a one-line comment noting the version assumption so a future unpin does not silently flip behavior.

What looks good

  • Env-var design lands in the right place: contributor laptops keep the soft-skip path, only CI flips to hard-fail. Tests hit both branches plus the truthy / non-truthy value table.
  • Conda-forge for rio-cogeo + GDAL avoids the GDAL build chain on PyPI. The post-pip re-import check catches the case where pip install -e .[tests] shadows the conda stack.
  • Pinning Python to 3.12 is the safe call here. A red required check every time conda-forge lags a Python release would be worse than the pin.
  • Daily cron, workflow_dispatch, and the 04:00 UTC offset from the other two nightlies are all wired up.
  • Docs paragraph sits in its own subsection, leaving room for #2300's "Stable COG contract" section without merge conflict, and the env-var semantics are stated outright.

Checklist

  • Behavior matches issue contract (env unset = skip, env set + dep missing = fail, dep present = run + fail-on-errors)
  • Tests cover both branches plus the env-value table
  • Docstrings present and accurate
  • No premature materialization or unnecessary copies (n/a -- CI infra change)
  • Benchmark exists or is not needed (n/a)
  • README feature matrix updated (n/a -- no new public function)
  • Backend coverage (n/a)

- Add concurrency cancel-in-progress, contents:read permissions, and
  timeout-minutes:20 to the new workflow.
- Tighten the rio_cogeo import-blocking stub so it matches the osgeo
  branch (both check fromlist).
- Expand _require_validator_env docstring with the truthy value list.
- Swap the broken gdaladdo link in geotiff.rst for the actual
  validate_cloud_optimized_geotiff source URL.
- Note the pytest.fail.Exception >= pytest 7 assumption inline.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

Applied in d537e7ae:

  • Concurrency / permissions / timeout on the workflow: added concurrency (cancel-in-progress: true), permissions: contents: read, and timeout-minutes: 20.
  • Stub consistency: _blocked_import now checks fromlist on both branches (cog_validate for rio_cogeo, validate_cloud_optimized_geotiff for osgeo).
  • Docstring: _require_validator_env now enumerates the truthy spellings inline.
  • Docs link: replaced the gdaladdo URL with the GDAL source URL for validate_cloud_optimized_geotiff.py.
  • Pytest pin note: one-line comment on pytest.fail.Exception flagging the pytest >= 7 assumption.
  • Matrix comment: clarified that 3.13 / 3.14 will land in the matrix once conda-forge catches up.

pytest xrspatial/geotiff/tests/test_cog_writer_compliance.py still passes locally (43 passed, 1 skipped where the optional validator dep is genuinely missing).

No new findings on this pass. Ready for CI to weigh in.

…2308)

The new rio-cogeo CI gate surfaced a real writer bug: overview tile
blocks are emitted in the wrong order, so external validators reject
the file. The in-process round-trip and local IFD layout check still
pass; only block ordering across overviews trips strict validators.

Mark `test_external_cog_validator` as xfail(strict=False) and link
issue xarray-contrib#2308 so the gate is wired up and the rest of the compliance
suite stays green. Drop the marker when the writer fix lands.

Refs xarray-contrib#2302.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: CI surfaced a real writer bug

The new pytest-cog-validator job ran on the previous push and test_external_cog_validator failed with rio-cogeo's cog_validate reporting:

The offset of the first block of overview of index 0 should be after the one of the overview of index 1
The offset of the first block of the main resolution image should be after the one of the overview of index 1

This is exactly what the gate was supposed to catch: an interop bug that the in-process round-trip and the local _assert_ifds_before_data check both miss. The writer emits overview tile blocks in the wrong order relative to the COG spec.

The writer fix is out of scope for this infrastructure PR (#2302's mandate is the gate itself, not changes to _writer.py). I filed #2308 with a repro and acceptance criteria, and added @pytest.mark.xfail(strict=False, reason="...see #2308") to test_external_cog_validator so the gate is wired up and the rest of the compliance suite is green. The xfail marker comes off when #2308 lands.

The rest of the suite (43 tests including the strict-mode env contract tests) still passes on CI with rio-cogeo installed.

@brendancol brendancol marked this pull request as ready for review May 22, 2026 13:12
@brendancol brendancol merged commit b94e0fe into xarray-contrib:main May 22, 2026
8 checks passed
brendancol added a commit that referenced this pull request May 22, 2026
)

The xfail was added in #2304 as a placeholder so the rio-cogeo CI gate
could land green while the overview block ordering bug was being fixed.
#2309 landed that writer fix, so the test now passes -- drop the marker
and the accompanying comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rio-cogeo / GDAL validate_cloud_optimized_geotiff as required CI gate (#2286 prod-ready wave C)

1 participant