Make rio-cogeo COG validator a required CI gate (#2302)#2304
Conversation
- New `pytest-cog-validator` workflow installs rio-cogeo + GDAL from conda-forge on Linux, sets `XRSPATIAL_REQUIRE_COG_VALIDATOR=1`, and runs the compliance + parity suite. - `_try_cog_validate` now fails (not skips) when the env var is set and neither validator is importable, so a misconfigured install cannot quietly pass the gate. Env unset keeps the soft-skip path for contributor laptops. - Docs note under `geotiff.rst` describes the new CI gate.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Make rio-cogeo COG validator a required CI gate (#2302)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
.github/workflows/test-cog-validator.yml: noconcurrency:block. A rapid sequence of pushes queues up a fresh micromamba env build per push. Add:concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true
.github/workflows/test-cog-validator.yml: addpermissions: contents: readat the top level. Nothing in this job writes, and the missing block makes it inherit the repo default.xrspatial/geotiff/tests/test_cog_writer_compliance.py:573-579(and:611-617): the_blocked_importstub matchesname == "rio_cogeo.cogeo"without checkingfromlist, but the osgeo branch right below it does checkfromlist. Make the two branches use the same shape. Either tighten the rio_cogeo branch withfromlist and "cog_validate" in tuple(fromlist), or drop the fromlist check from the osgeo branch..github/workflows/test-cog-validator.yml:78-85: notimeout-minuteson the pytest step. If a future regression hangsto_geotiffor the validator, the job holds the runner for the full 6h default.timeout-minutes: 20is plenty for this subset.
Nits
docs/source/reference/geotiff.rst:32: the second hyperlink points atgdal.org/programs/gdaladdo.html#gdaladdo, which is thegdaladdopage, not thevalidate_cloud_optimized_geotiffsample the sentence is actually about. Swap to the GDAL cookbook / source URL, or drop the link and leavevalidate_cloud_optimized_geotiffas plain text.xrspatial/geotiff/tests/test_cog_writer_compliance.py:182-191:_require_validator_envsays "set truthy" without spelling out which values count. The non-truthy test already enumerates1, true, yes, on; lifting that list into the docstring would make the helper self-documenting..github/workflows/test-cog-validator.yml: matrix has one python entry (['3.12']). If this is a permanent pin, drop the matrix and just useruns-on: ubuntu-latest. If 3.13 / 3.14 are meant to land here later, say so in a comment.pytest.raises(pytest.fail.Exception, ...)works on pytest >= 7 (which is what this repo pins), butpytest.fail.Exceptionis an alias for_pytest.outcomes.Failed. Worth a one-line comment noting the version assumption so a future unpin does not silently flip behavior.
What looks good
- Env-var design lands in the right place: contributor laptops keep the soft-skip path, only CI flips to hard-fail. Tests hit both branches plus the truthy / non-truthy value table.
- Conda-forge for rio-cogeo + GDAL avoids the GDAL build chain on PyPI. The post-pip re-import check catches the case where
pip install -e .[tests]shadows the conda stack. - Pinning Python to 3.12 is the safe call here. A red required check every time conda-forge lags a Python release would be worse than the pin.
- Daily cron,
workflow_dispatch, and the 04:00 UTC offset from the other two nightlies are all wired up. - Docs paragraph sits in its own subsection, leaving room for #2300's "Stable COG contract" section without merge conflict, and the env-var semantics are stated outright.
Checklist
- Behavior matches issue contract (env unset = skip, env set + dep missing = fail, dep present = run + fail-on-errors)
- Tests cover both branches plus the env-value table
- Docstrings present and accurate
- No premature materialization or unnecessary copies (n/a -- CI infra change)
- Benchmark exists or is not needed (n/a)
- README feature matrix updated (n/a -- no new public function)
- Backend coverage (n/a)
- Add concurrency cancel-in-progress, contents:read permissions, and timeout-minutes:20 to the new workflow. - Tighten the rio_cogeo import-blocking stub so it matches the osgeo branch (both check fromlist). - Expand _require_validator_env docstring with the truthy value list. - Swap the broken gdaladdo link in geotiff.rst for the actual validate_cloud_optimized_geotiff source URL. - Note the pytest.fail.Exception >= pytest 7 assumption inline.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review
Applied in d537e7ae:
- Concurrency / permissions / timeout on the workflow: added
concurrency(cancel-in-progress: true),permissions: contents: read, andtimeout-minutes: 20. - Stub consistency:
_blocked_importnow checksfromliston both branches (cog_validatefor rio_cogeo,validate_cloud_optimized_geotifffor osgeo). - Docstring:
_require_validator_envnow enumerates the truthy spellings inline. - Docs link: replaced the
gdaladdoURL with the GDAL source URL forvalidate_cloud_optimized_geotiff.py. - Pytest pin note: one-line comment on
pytest.fail.Exceptionflagging the pytest >= 7 assumption. - Matrix comment: clarified that 3.13 / 3.14 will land in the matrix once conda-forge catches up.
pytest xrspatial/geotiff/tests/test_cog_writer_compliance.py still passes locally (43 passed, 1 skipped where the optional validator dep is genuinely missing).
No new findings on this pass. Ready for CI to weigh in.
…2308) The new rio-cogeo CI gate surfaced a real writer bug: overview tile blocks are emitted in the wrong order, so external validators reject the file. The in-process round-trip and local IFD layout check still pass; only block ordering across overviews trips strict validators. Mark `test_external_cog_validator` as xfail(strict=False) and link issue xarray-contrib#2308 so the gate is wired up and the rest of the compliance suite stays green. Drop the marker when the writer fix lands. Refs xarray-contrib#2302.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up: CI surfaced a real writer bug
The new pytest-cog-validator job ran on the previous push and test_external_cog_validator failed with rio-cogeo's cog_validate reporting:
The offset of the first block of overview of index 0 should be after the one of the overview of index 1
The offset of the first block of the main resolution image should be after the one of the overview of index 1
This is exactly what the gate was supposed to catch: an interop bug that the in-process round-trip and the local _assert_ifds_before_data check both miss. The writer emits overview tile blocks in the wrong order relative to the COG spec.
The writer fix is out of scope for this infrastructure PR (#2302's mandate is the gate itself, not changes to _writer.py). I filed #2308 with a repro and acceptance criteria, and added @pytest.mark.xfail(strict=False, reason="...see #2308") to test_external_cog_validator so the gate is wired up and the rest of the compliance suite is green. The xfail marker comes off when #2308 lands.
The rest of the suite (43 tests including the strict-mode env contract tests) still passes on CI with rio-cogeo installed.
Closes #2302.
Part of the #2286 production-readiness wave. The compliance suite has had an optional
rio-cogeo/ GDALvalidate_cloud_optimized_geotiffcheck for a while, but it silently skipped when the dependency was missing. That is fine on contributor laptops; it is not a useful guarantee on CI. This PR makes the validator a required gate.What changed
.github/workflows/test-cog-validator.yml. Linux-only, installsrio-cogeoand GDAL Python bindings from conda-forge, setsXRSPATIAL_REQUIRE_COG_VALIDATOR=1, and runspytest xrspatial/geotiff/tests/test_cog_writer_compliance.py xrspatial/geotiff/tests/test_cog_parity_2286.py -x. Pinned to Python 3.12 because the GDAL stack on 3.14 from conda-forge is not yet uniformly available.xrspatial/geotiff/tests/test_cog_writer_compliance.py:_try_cog_validatenow checksXRSPATIAL_REQUIRE_COG_VALIDATOR. Env unset and validators missing keeps the old soft-skip path. Env set and validators missing fails the test loudly.docs/source/reference/geotiff.rst: one-paragraph note describing the new CI gate.Backward compatibility
The env var is unset by default. Contributor machines without rio-cogeo / GDAL installed still pass the suite -- only CI sets the var.
Test plan
pytest xrspatial/geotiff/tests/test_cog_writer_compliance.pypasses locally (43 passed, 1 skipped where the validator dep is genuinely missing).XRSPATIAL_REQUIRE_COG_VALIDATOR=1 pytest xrspatial/geotiff/tests/test_cog_writer_compliance.py::test_external_cog_validatorfails locally with the expected loud message when rio-cogeo / GDAL are absent.pytest-cog-validatorjob runs and is green on this PR.