Move TRT-RTX runtime controls to runtime context managers (v3, for review)#3
Move TRT-RTX runtime controls to runtime context managers (v3, for review)#3tp5uiuc wants to merge 14 commits into
Conversation
…anagers Replaces the v2 design that packed three runtime-mode controls (``cuda_graph_strategy``, ``dynamic_shapes_kernel_specialization_strategy``, ``runtime_cache``) into ``CompilationSettings`` and the serialized engine tuple. Per pytorch#4310, these are runtime mode controls -- not engine properties -- and shouldn't pin at compile time or round-trip through serialization. Highlights: * New ``RuntimeSettings`` dataclass on both Python and C++ sides (``py/torch_tensorrt/runtime/_runtime_settings.py``, ``core/runtime/RuntimeSettings.h``). Three fields: ``dynamic_shapes_kernel_specialization_strategy``, ``cuda_graph_strategy``, ``runtime_cache``. The cache field accepts ``None``, a path string (engine creates an implicit handle, saves on ``__del__``, mirrors old ``runtime_cache_path=`` behavior), or a ``RuntimeCacheHandle`` (shared cache, lifecycle owned by the ``runtime_cache()`` CM). * New ``RuntimeCacheHandle`` registered as a torchbind class (``torch.classes.tensorrt.RuntimeCacheHandle``) so the same C++ ``IRuntimeCache`` shared_ptr crosses the Python/C++ boundary. * New per-engine ``update_runtime_settings`` API on both ``TRTEngine`` flavors. Fast-paths on settings equality; eagerly rebuilds ``IRuntimeConfig`` + recreates execution context on diff. * Three new context managers in ``torch_tensorrt.runtime``: ``runtime_config(target_or_targets, **kw)`` (the pool API; also yields the target so ``with runtime_config(model, ...) as m:`` works), ``runtime_cache(target, path)`` (shared cache CM), and the per-knob sugars ``set_cuda_graph_strategy`` / ``set_dynamic_shapes_kernel_strategy``. All three accept a list of modules for multi-target use; the cache CM yields the ``RuntimeCacheHandle`` for inspection or explicit ``save()``. * New ``runtime_settings=`` kwarg on ``compile()``, ``cross_compile_for_windows()``, and ``convert_module()`` so callers can prime the engine with the right values up front. Compile-time hint avoids the enter/exit recreate cost. * ``CompilationSettings`` loses the three fields; the compiler entry points drop the three kwargs. ``SerializedInfoIndex`` drops the four RTX-related slots; ``SERIALIZATION_LEN`` returns to 12. Engines saved with the old 16-slot layout will raise the existing layout-mismatch error on load. * Three existing test files migrated to the new API; new ``tests/py/dynamo/runtime/test_004_runtime_settings.py`` covers the data model, compile-time hint, runtime CM restore semantics, multi-target form, and dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-up bugs exposed by the cross-runtime test parameterization on
the C++ engine path:
1. ``torch.classes.tensorrt.Engine.update_runtime_settings(...)`` rejected
Python ``None`` for the ``RuntimeCacheHandle`` argument because TorchBind
does not auto-convert ``None`` to a null ``c10::intrusive_ptr``. Switch
the signature to ``c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>>``
so the default ``runtime_cache=None`` case round-trips cleanly.
2. ``RuntimeSettings(runtime_cache="/some/path")`` only auto-saved to disk
on engine destruction for the Python runtime (via ``_TRTEngine.__del__``).
The C++ engine had no equivalent saver and the IRuntimeCache it
materialized internally wasn't accessible from Python.
Make the cpp path symmetric:
- Expose ``serialize() -> at::Tensor`` / ``deserialize(at::Tensor)`` /
``has_cache()`` on the torchbind ``RuntimeCacheHandle`` class. ``at::Tensor``
of uint8 is used instead of ``std::string`` because TorchBind forces
``std::string`` through Python ``str`` (UTF-8) and serialized cache bytes
are not valid UTF-8.
- In ``TorchTensorRTModule.setup_engine`` (cpp branch), pre-materialize a
torchbind handle when ``runtime_cache`` is a path string, store it on
the module, and substitute it into ``_runtime_settings`` so the dispatch
passes the same handle through.
- Add ``_load_cpp_implicit_cache`` / ``_save_cpp_implicit_cache`` and a
module ``__del__`` that mirrors the Python ``_TRTEngine`` saver, with
``filelock`` + atomic-rename semantics.
- Teach ``_to_torchbind_handle`` to pass an already-torchbind
``torch.ScriptObject`` through unchanged.
All cpp + python runtime tests pass on TRT-RTX 1.5: test_004 (12/12),
test_000 (10/10), test_001 dynamic_shapes (14/14), test_001 cuda_graph
(13/13).
…timeCacheHandle lifecycle Structural cleanup on top of the v3 work (no observable behavior change). C++ side -------- ``RuntimeSettings`` migrates from a ``TRTEngine`` member to a ``TRTRuntimeConfig`` member -- the value-type now lives with its primary consumer (the IRuntimeConfig builder). ``TRTRuntimeConfig`` gains ``set_settings()`` (the diff-and-invalidate primitive) and turns the static ``uses_internal_capture`` / ``is_monolithic_capturable`` helpers into instance methods so callers do not need to pass settings around. ``TRTEngine::runtime_settings()`` forwards through. Python side ----------- Introduces a Python ``TRTRuntimeConfig`` class mirroring the C++ struct. ``_TRTEngine`` drops its three legacy fields (``runtime_config``, ``runtime_settings``, ``_implicit_cache_handle``) for a single ``self._trt_runtime_config`` member; ``_create_execution_context`` / ``update_runtime_settings`` / ``_is_monolithic_capturable`` / ``_enable_rtx_native_cudagraphs`` all delegate. Every ``ENABLED_FEATURES.tensorrt_rtx`` branch related to runtime-mode controls is absorbed into the shim, so engine and module call sites stay uniform across TRT and TRT-RTX builds. Following the project's grouping convention, ``py/torch_tensorrt/runtime/_runtime_settings.py`` is merged into ``_runtime_config.py``; that file now holds ``RuntimeSettings``, the new ``TRTRuntimeConfig``, the existing ``runtime_config()`` CM, and its factory. Imports across the tree are repointed. RuntimeCacheHandle ownership model ---------------------------------- Save-on-destruction moves from the two engine-side ``__del__`` paths (``_TRTEngine.close()`` for Python runtime, ``TorchTensorRTModule.__del__`` for cpp runtime) onto ``RuntimeCacheHandle.__del__`` itself, gated by a new ``autosave_on_del`` flag. The flag is set by ownership context: * Engine-implicit handles (created from a path-string compile-time hint) get ``autosave_on_del=True`` -- no other Python object holds them, so the destructor is the only save opportunity. * The ``runtime_cache(target, path)`` CM uses ``autosave_on_del=False`` on the handle it constructs; its ``__exit__`` saves explicitly. * Hand-built handles default to ``autosave_on_del=False`` so save timing stays under the user's control. The handle additionally accepts a ``torchbind_handle`` sibling so the same Python object can wrap either a ``trt.IRuntimeCache`` (Python rt) or a ``torch.classes.tensorrt.RuntimeCacheHandle`` (cpp rt); ``save`` / ``load`` source bytes from whichever is populated. The cpp-runtime helpers on ``TorchTensorRTModule`` (``_load_cpp_implicit_cache``, ``_save_cpp_implicit_cache``, ``__del__``) and the duplicate save logic in ``_TRTEngine.close()`` are removed; both runtimes funnel through the single ``RuntimeCacheHandle.__del__`` path. Tests ----- test_000 grows two new tests asserting the new contract: * ``test_cm_does_not_double_save_on_rc_gc`` -- only one save fires per CM block even after ``rc`` is GC'd. * ``test_user_built_handle_no_autosave_by_default`` -- hand-built handles do not autosave on GC. All 51 runtime tests pass on the refactored design (test_004 12/12, test_000 12/12, test_001 ds 14/14, test_001 cg 13/13).
Five follow-up changes responding to PR review comments: * **Fold strategy sugar into ``_runtime_config.py``.** Delete ``_dynamic_shapes_kernel_strategy.py`` and ``_cuda_graph_strategy.py``; ``set_dynamic_shapes_kernel_strategy`` / ``set_cuda_graph_strategy`` now live alongside the ``runtime_config`` CM they delegate to. ``torch_tensorrt/runtime/__init__.py`` re-exports them from the consolidated module. * **Hoist ``RuntimeSettings`` defaults into ``_defaults.py``.** Three new constants (``DYNAMIC_SHAPES_KERNEL_SPECIALIZATION_STRATEGY``, ``CUDA_GRAPH_STRATEGY``, ``RUNTIME_CACHE_PATH``) mirror the compilation-settings pattern. ``RUNTIME_CACHE_PATH`` defaults to a per-user temp file similar to ``ENGINE_CACHE_DIR``, so users get a disk-backed runtime cache without explicit opt-in; override via ``RuntimeSettings(runtime_cache="/path")`` or the ``runtime_cache`` CM. Test_000 and test_004 updated to reflect the new default. * **Warn on non-RTX ``RuntimeSettings`` construction.** ``__post_init__`` now emits a one-shot ``UserWarning`` on regular TRT builds (gated by ``ENABLED_FEATURES.tensorrt_rtx``) so users see that the settings have no effect. * **Drop ``TYPE_CHECKING`` string forward-refs for ``RuntimeSettings``.** Direct top-level imports across ``_compiler.py``, ``_conversion.py``, ``_TRTEngine.py`` and ``_TorchTensorRTModule.py``; bare ``Optional[RuntimeSettings]`` annotations everywhere. Deferred imports inside ``__init__`` / ``__setstate__`` removed. All 51 runtime tests pass (test_004 12/12, test_000 12/12, test_001 ds 14/14, test_001 cg 13/13).
| [](const c10::intrusive_ptr<TRTEngine>& self, | ||
| std::string const& dynamic_shapes_kernel_specialization_strategy, | ||
| std::string const& cuda_graph_strategy, | ||
| c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache) -> void { |
There was a problem hiding this comment.
Is it not possible to implement this as a property with getter and setter because of this c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache?
There was a problem hiding this comment.
Possible — the c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> signature is fine for torchbind def_property (device_memory_budget immediately below in this same registration is a property on TRTEngine, for a comparable point).
The reason update_runtime_settings is a single bundled setter is that RuntimeSettings is the unit of context invalidation: changing any one of the three fields ends up calling recreate_execution_context once. Splitting into three individual properties would cause three sequential context-recreates on the engine-setup path (where all three are set together via _dispatch_runtime_settings_to_engine). The diff-check inside TRTRuntimeConfig::set_settings would catch no-op repeats, but consecutive changing writes would each trigger a recreate.
If you would rather have property syntax I can split it, but the bundled form keeps setup tight. WDYT?
There was a problem hiding this comment.
Maybe a compromise here would be to have a tuple(...) as a setter in both python and C++ and pass the data back and forth, so that .settings = would call the update settings method? But that would mean python and C++ code within TRTEngine.py needs to be handled differently (since RuntimeSettings is not available in C++ API, and nor should it be since we only use the python API). Then internally (in this function) we can unpack the tuple (or even use std::apply()) to convert to runtime settings and move it internally to update_runtime_settings.
There was a problem hiding this comment.
Discussion-only: the tuple-as-property idea on torchbind is doable but I want to flag the cost before going down that road.
To make engine.settings = ... work as a Python-side property on torch.classes.tensorrt.Engine we would need to:
- Define a torchbind
def_property("settings", getter, setter)whose setter accepts a tuple-of-primitives (since TorchBind cannot carry theRuntimeSettingsvalue type natively -- only scalars, strings, tensors, and registered torchbind classes). - The tuple shape would have to mirror our struct:
(int64_t ds_strategy, int64_t cg_strategy, optional<intrusive_ptr<RuntimeCacheHandle>>). Same data asupdate_runtime_settingstoday, just packaged. - On the Python
_TRTEngineside, mirror the same property:engine.settingsreturns aRuntimeSettingsdataclass;engine.settings = rsdoes the dispatch.
The asymmetry you flagged is real: _TRTEngine.py (Python runtime) has access to the RuntimeSettings dataclass directly, but the cpp-torchbind Engine only sees the tuple form. Python module code that talks to self.engine has to branch on isinstance(self.engine, TRTEngine) -- exactly the pattern we already have in _dispatch_runtime_settings_to_engine, except now it would also be true for the property read path (not just write).
Net: the current state -- update_runtime_settings method on the C++ torchbind binding + runtime_settings property on the Python TorchTensorRTModule wrapper -- already gives you mod.runtime_settings = rs at the user-facing layer, without forcing the engine-class boundary to also be a property. Going the extra step to make self.engine.settings = ... work has only an internal-API benefit (the dispatch path), at the cost of a more complex tuple-marshaling property.
Happy to do it if you want it for symmetry, but my preference would be to leave the engine binding as a method and treat the module-level property as the API contract. WDYT?
Mirror ``TRTRuntimeConfig.set_settings`` (Python runtime) on the cpp runtime path. Previously the cpp side dropped the C++ engine's intrusive_ptr on settings change but left ``self._implicit_cache_handle`` on the ``TorchTensorRTModule`` pointing at the *old* wrapper -- the new cache had no Python autosave companion and never wrote to disk. Factor the path-string-to-torchbind-handle materialization into ``TorchTensorRTModule._materialize_cpp_implicit_handle``. Called from ``setup_engine`` and ``_dispatch_runtime_settings_to_engine`` (cpp branch); synchronously saves the prior wrapper before swap, replaces ``self._implicit_cache_handle`` with the new one, then runs ``load()`` after the C++ engine has attached the IRuntimeCache. Test: ``test_set_runtime_settings_saves_prior_cache_on_swap`` (parametrized over both runtimes). Compiles with path A; swaps to path B; asserts A is written synchronously at swap time and B is written on ``del compiled``. The walk-to-inner-module is wrapped in a helper so the loop variable doesn't outlive the call and keep the inner TRT module alive past ``del compiled`` (which would suppress the post-del autosave). All 53 tests pass (test_004 12/12, test_000 14/14, test_001 ds 14/14, test_001 cg 13/13).
C++:
- ``RuntimeSettings`` strategy fields are now typed ``enum class : int32_t``
values (``DynamicShapesKernelSpecializationStrategy`` /
``CudaGraphStrategy``) mirroring the nvinfer1 enums. Validation moves
to dedicated boundary helpers ``to_dynamic_shapes_kernel_strategy`` /
``to_cuda_graph_strategy`` called from the torchbind
``update_runtime_settings`` lambda; the rest of the code uses enum
values directly (no more raw ``int32_t`` field reads).
- Reverse-lookup helpers ``ds_strategy_name`` / ``cg_strategy_name`` now
take the enum type and return ``std::string_view``; the lookup tables
switch to ``std::array<std::string_view, N>``.
- ``RuntimeCacheHandle::cache`` renamed to ``trt_handle`` so call sites
read ``runtime_cache->trt_handle`` instead of ``runtime_cache->cache``.
- ``TRTRuntimeConfig::set_settings`` renamed to ``settings(RuntimeSettings)``
(overload of the getter) with ``[[nodiscard]]``. ``TRTEngine``'s
``update_runtime_settings`` similarly renamed to ``runtime_settings(...)``
overload with ``[[nodiscard]] bool`` return. Torchbind binding name
stays ``update_runtime_settings`` for Python contract stability.
- ``TRTRuntimeConfig::is_monolithic_capturable`` drops the unconditional
``noexcept`` (the RTX branch uses ``TORCHTRT_ASSERT`` which can
throw).
- ``TRTEngine::num_execution_contexts_created`` regains ``noexcept`` --
bound via a torchbind lambda to sidestep the lack of a
``const noexcept`` ``def`` specialization.
- ``TRTEngine::has_dynamic_inputs`` default changed to ``false``.
- ``TRTRuntimeConfig::ensure_initialized`` introduces an
``auto& rt_cache = settings_.runtime_cache`` alias for the cache
attachment block.
- ``RuntimeSettings::to_str`` wraps its output in ``RuntimeSettings{...}``.
- ``RuntimeCacheHandle::serialize`` collapses the three early
``at::empty({0}, opts)`` returns into a single ``empty`` lambda.
Python:
- ``TorchTensorRTModule.set_runtime_settings(rs)`` becomes a
``runtime_settings`` property setter so callers write
``mod.runtime_settings = rs``. Operates on ``self``; outer callers
walk ``named_modules()`` themselves (the ``runtime_config`` CM and
tests already do).
- Docstrings + the prior caller in ``runtime_config`` CM updated to use
the setter syntax.
All 61 runtime tests pass on TRT-RTX 1.5.0.103.
|
Round 4 review feedback addressed in 38b7033 (full build + 61/61 runtime tests pass on TRT-RTX 1.5.0.103). C++ changes
Python changes
Discussion-only replies posted on:
|
Layered: stream <-> bytes is the new primitive, path-mode opens the file and delegates. The CM accepts ``str`` / ``os.PathLike`` / file-like in the same positional slot; a stream is read once on enter and written once on exit, with open/close ownership staying with the caller's ``with open(...)`` block. ``io.BytesIO``, gzip streams, and HTTP buffers all "just work" through the same code path. * ``RuntimeCacheHandle.load_from_stream`` / ``save_to_stream`` are the byte-bridge primitives; ``load`` / ``save`` now delegate (atomic tmp+rename + filelock stays in path-mode where it belongs). * ``_RuntimeCacheContextManager`` duck-types the IO arg, raises TypeError on anything that's not a path, PathLike, or file-like. * Read-only / write-only streams degrade silently (OSError / UnsupportedOperation are caught), matching the early-return path for ``path=""``. Tests: 4 new cases for BytesIO round-trip, real file handle, the handle-level stream primitives, and the TypeError contract. Existing 65 runtime tests (incl. all CM + persistence + autosave tests) stay green on TRT-RTX 1.5.0.103.
…rs propagate
Two follow-up cleanups after review.
* ``RuntimeCacheHandle.__eq__`` / ``__hash__`` were spelling out the default
``object`` semantics (identity comparison, id-derived hash). Deleted both;
moved the rationale ("handles wrap distinct ``IRuntimeCache`` instances
even when paths match -- separate slots in ``IRuntimeConfig``, no
kernel-specialization sharing") into the class docstring under
"Equality is identity-based".
* ``load_from_stream`` / ``save_to_stream`` were swallowing
``(AttributeError, OSError)`` and conflating it with the legitimate
"nothing to load / nothing to save" cases (both returned ``0``). That
hid programmer bugs: passing a write-only sink to load, or a closed
handle to save, looked identical to first-run empty. Path-mode
``load`` / ``save`` already let IO errors propagate; the stream
primitives now do the same, so ``0`` unambiguously means "the buffer
was empty". Callers that genuinely want a tolerant variant can wrap
the call themselves.
All 65 runtime tests stay green on TRT-RTX 1.5.0.103.
…ensorRTModule Before: the Python-rt implicit ``RuntimeCacheHandle`` lived on ``TRTRuntimeConfig._implicit_cache_handle`` (exposed via a forwarding property on ``_TRTEngine``), while the cpp-rt one already lived on ``TorchTensorRTModule._implicit_cache_handle``. Two locations, two construct/swap/save code paths, mostly mirrored. The save-unification landed earlier via ``RuntimeCacheHandle.__del__`` -- this commit closes the loop by giving both runtimes one storage slot. * ``TorchTensorRTModule._implicit_cache_handle`` is now the canonical single owner. ``_materialize_cpp_implicit_handle`` renamed to ``_materialize_implicit_handle`` and branches on ``self._use_python_runtime``; the helper builds a wrapper, swap-saves the prior on change, and returns the dispatch-flavored settings. * ``setup_engine`` pre-wraps string ``runtime_cache`` paths for BOTH runtimes before the engine is constructed, so ``TRTRuntimeConfig`` only ever sees ``None`` or an external ``RuntimeCacheHandle`` -- the ``isinstance(rc, str)`` branch in ``_apply_settings`` is gone, replaced by an explicit ``TypeError`` to catch new callers. * ``TRTRuntimeConfig`` shrinks: no ``_implicit_cache_handle`` field, no ``implicit_cache_handle`` property, no save-on-swap inside ``set_settings``. The class is now a pure-execution shim. To keep the Python-rt's lazy ``IExecutionContext`` semantics (the handle's pybind ``IRuntimeCache`` doesn't exist until ``ensure_cache`` fires inside ``_apply_settings``), ``_apply_settings`` auto-loads when an attached handle has ``autosave_on_del=True and path``. The module's ``needs_load`` after dispatch still drives the cpp-rt load (the torchbind sibling materializes eagerly in C++). * ``_to_torchbind_handle`` learns to pull ``_torchbind`` from a Python ``RuntimeCacheHandle`` wrapper rather than constructing a fresh torchbind sibling -- otherwise CM enter/exit would orphan the cpp-rt cache pointer on every re-dispatch. * ``_materialize_implicit_handle`` gains a no-op fast path for "incoming ``rc`` is the wrapper we already own", which is exactly the shape of ``mod.runtime_settings = current`` (CM enter with no override on ``runtime_cache``). Without it the helper would relinquish ownership and save-then-rebuild on every CM step. * Tests: 2 introspection tests now reach the handle through ``module._implicit_cache_handle`` instead of ``engine._implicit_cache_handle``; new ``_find_python_trt_module`` helper. The 65-test runtime suite stays green on TRT-RTX 1.5.0.103.
Five small clean-ups from the latest review pass; no behavior change. * ``TorchTensorRTModule.runtime_settings`` setter: drop the ``if self.engine is None`` early return -- ``_dispatch_runtime_settings_to_engine`` already no-ops on a None engine, so the setter collapses to two lines. * ``_materialize_implicit_handle``: ``getattr(old, "path", None) == rc`` was defensive paranoia. ``old`` is always a ``RuntimeCacheHandle`` here, so ``old.path == rc`` is enough. * ``TRTEngine._num_execution_contexts_created``: initialized to ``0`` in ``__init__`` and ``__setstate__`` instead of being lazily summoned by ``getattr``. Increment is now ``+= 1`` and the getter returns the attribute directly. * ``TRTRuntimeConfig.set_settings``: ``self._live = None`` becomes ``self.reset()`` -- one fewer place to remember which fields ``reset()`` clears. All 65 runtime tests stay green on TRT-RTX 1.5.0.103.
…ross-compile
* ``RuntimeSettings.cpp::{ds,cg}_strategy_name``: a ``size_t`` cast wraps
negative underlying values to a huge unsigned, so a single ``i < size``
check covers both ends. No ``std::clamp``, no separate ``i < 0`` arm,
fewer casts at the call site.
* ``dynamo._compiler.cross_compile_for_windows``: dropped the
``runtime_settings`` keyword. Runtime settings are runtime-only knobs
applied at ``IExecutionContext`` creation; the cross-compiled engine is
consumed on Windows where the caller controls them via
``mod.runtime_settings = ...`` or the ``runtime_config`` CM. Passing
them at cross-compile time was a no-op signal. ``compile`` and
``compile_module`` still accept the kwarg for the same-platform flow.
All 65 runtime tests stay green on TRT-RTX 1.5.0.103.
Rewrites the v2 design (PR #2 base branch) to move
cuda_graph_strategy,dynamic_shapes_kernel_specialization_strategy,runtime_cachefromCompilationSettings/ serialized engine slots to runtime context managers per pytorch#4310.Summary
RuntimeSettingsdataclass on both Python and C++ sides;RuntimeCacheHandleregistered as a torchbind class for shared-cache semantics.torch_tensorrt.runtime:runtime_config(pool API),runtime_cache(shared cache), plus per-knob sugars. All accept a list of modules.runtime_settings=kwarg oncompile()/cross_compile_for_windows()/convert_module()for compile-time hints (1 context-create cost, no enter/exit recreate).update_runtime_settings(rs)with fast-path equality check; rebuildsIRuntimeConfig+ recreates execution context on diff.SerializedInfoIndexdrops 4 RTX slots;SERIALIZATION_LENback to 12.Tests
test_004_runtime_settings.py(12 tests) covering data model, compile-time hint, CM restore, multi-target, dispatch.test_000_runtime_cache.py,test_001_dynamic_shapes_kernel_strategy.py,test_001_cuda_graph_strategy.pymigrated to the new API.Status
SKIP=mypyfor the pre-existing_TRTEngine.pyerrors tracked separately).test_004all 12 pass; Python-runtime half of the three other test files passes.libtensorrt_rtx.so.1atcuda_engine->getStreamableWeightsSize()-- I confirmed this is a pre-existing environmental issue on the test node (the same crash occurs with a known-good pre-built v2 wheel installed in the same env), not a regression from this refactor.