Merge forward 3006.x into 3007.x#69401
Merged
Merged
Conversation
doc/conf.py hard-coded "version_match": "master", so every published
build (including the 3008.0 release at /en/latest/) emitted
DOCUMENTATION_OPTIONS.theme_switcher_version_match = 'master' and the
dropdown always highlighted "Master (Dev)" as the current version.
Make version_match dynamic so it matches a versions.json entry by
BUILD_TYPE:
- master / next -> "master"
- latest -> "latest"
- previous / other -> release.split(".", 1)[0] (e.g. "3006")
Also fix the dropdown URLs in doc/_static/versions.json. The 3006.24
entry pointed at /en/3006.24/ which returns 404; the served URL pattern
is /en/<major>/ (e.g. /en/3006/, /en/3008/). Rewrite the entries to use
major-only paths and add a 3008 entry, since the latest release at
docs.saltproject.io is now 3008.0.
Note: doc/conf.py points every build at the absolute json_url
https://docs.saltproject.io/en/latest/_static/versions.json, so the
served versions.json is whichever branch publishes /en/latest/. Updating
versions.json here is harmless on 3006.x but only takes effect site-wide
when the same change reaches the branch that builds /en/latest/.
Verified locally with the CI invocation under a python3.14 venv built
from requirements/static/ci/py3.14/docs.lock:
cd doc && make clean && make html SPHINXOPTS='-W -j auto --keep-going --color'
cd doc && make clean && BUILD_TYPE=latest make html SPHINXOPTS='-W -j auto --keep-going --color'
Rendered contents.html emits:
theme_switcher_version_match = 'master' (default)
theme_switcher_version_match = 'latest' (BUILD_TYPE=latest)
pkg.list_patches in yumpkg.py parses tdnf output on Photon OS
Fix docs version switcher selecting the wrong active version
After the requirements .txt -> .in rename, MANIFEST.in still only included requirements/*.txt, so the PyPI sdist no longer shipped the files setup.py reads to populate install_requires. The build backend silently returned an empty list, and `pip install salt` from a sdist built off this branch would install salt with zero dependencies. Add *.in and *.lock to the recursive-include so the dependency manifests (and the locked variants used when USE_STATIC_REQUIREMENTS=1) end up in the sdist again. Backport of the 3008.x fix; changelog lives on the forward branch.
After the requirements .txt -> .in rename, MANIFEST.in still only included requirements/*.txt, so the PyPI sdist no longer shipped the files setup.py reads to populate install_requires. The build backend silently returned an empty list, and `pip install salt` from a sdist built off this branch would install salt with zero dependencies. Add *.in and *.lock to the recursive-include so the dependency manifests (and the locked variants used when USE_STATIC_REQUIREMENTS=1) end up in the sdist again. Backport of the 3008.x fix; changelog lives on the forward branch.
After the requirements .txt -> .in rename, MANIFEST.in still only included requirements/*.txt, so the PyPI sdist no longer shipped the files setup.py reads to populate install_requires. The build backend silently returned an empty list, and `pip install salt` from a sdist built off this branch would install salt with zero dependencies. Add *.in and *.lock to the recursive-include so the dependency manifests (and the locked variants used when USE_STATIC_REQUIREMENTS=1) end up in the sdist again. Backport of the 3008.x fix; changelog lives on the forward branch. # Conflicts: # requirements/static/ci/py3.10/cloud.lock # requirements/static/ci/py3.10/darwin.lock # requirements/static/ci/py3.10/freebsd.lock # requirements/static/ci/py3.10/lint.lock # requirements/static/ci/py3.10/linux.lock # requirements/static/ci/py3.10/windows.lock # requirements/static/ci/py3.11/cloud.lock # requirements/static/ci/py3.11/darwin.lock # requirements/static/ci/py3.11/freebsd.lock # requirements/static/ci/py3.11/lint.lock # requirements/static/ci/py3.11/linux.lock # requirements/static/ci/py3.11/windows.lock # requirements/static/ci/py3.12/cloud.lock # requirements/static/ci/py3.12/darwin.lock # requirements/static/ci/py3.12/freebsd.lock # requirements/static/ci/py3.12/lint.lock # requirements/static/ci/py3.12/linux.lock # requirements/static/ci/py3.12/windows.lock # requirements/static/ci/py3.13/cloud.lock # requirements/static/ci/py3.13/darwin.lock # requirements/static/ci/py3.13/freebsd.lock # requirements/static/ci/py3.13/lint.lock # requirements/static/ci/py3.13/linux.lock # requirements/static/ci/py3.13/windows.lock # requirements/static/ci/py3.14/cloud.lock # requirements/static/ci/py3.14/darwin.lock # requirements/static/ci/py3.14/freebsd.lock # requirements/static/ci/py3.14/lint.lock # requirements/static/ci/py3.14/linux.lock # requirements/static/ci/py3.14/windows.lock # requirements/static/ci/py3.9/cloud.lock # requirements/static/ci/py3.9/darwin.lock # requirements/static/ci/py3.9/freebsd.lock # requirements/static/ci/py3.9/lint.lock # requirements/static/ci/py3.9/linux.lock # requirements/static/ci/py3.9/windows.lock
) When the master returns a bare string error ("bad load", "Some exception handling minion payload", "Server-side exception handling payload") for a pillar request, the minion's crypted_transfer_decode_dictentry crashed with "TypeError: string indices must be integers" at ret["key"]. The "key" not in ret guard accidentally passed for strings (substring check), the post-reauth retry returned the same string, and the indexing blew up. Raise a clean AuthenticationError when the post-reauth response is still not a dict containing "key", so the caller can fail or retry instead of crashing. Matches the cluster_retry-decorated guard already on 3008.x / master (commit 2efc32e). Add a parametrized regression test covering all three server-side string error returns.
Fixes #68181 The minion was signing before AsyncReqChannel attached nonce/ts/tok, so master verification re-serialized different bytes and every signed return was dropped. Sign inside _package_load after the metadata is attached; remove the now-redundant signing from the five _send_req_* methods.
DeleteConfig_DECAC unconditionally deleted ROOTDIR\var during RemoveExistingProducts, destroying the MSI that pkg.install had cached to ROOTDIR\var\cache before launching the upgrade. Users with REMOVE_CONFIG=1 persisted in the registry hit a worse variant where the entire ROOTDIR was deleted. Fix: check UPGRADINGPRODUCTCODE in DeleteConfig_DECAC — set by Windows Installer whenever RemoveExistingProducts fires — and skip all ROOTDIR deletion during upgrades. Also remove the default var/srv deletion on standalone uninstall, aligning the MSI with the NSIS installer which has always preserved ROOTDIR during upgrades. Forward UPGRADINGPRODUCTCODE via the CADH entry in Product.wxs and add upgrade tests that plant a sentinel in ROOTDIR\var\cache (with and without REMOVE_CONFIG=1 in the registry) to catch regressions. Fixes #69219
The old 3006.25 MSI always deletes ROOTDIR\var unconditionally in its DeleteConfig_DECAC (with REMOVE_CONFIG it deletes the entire ROOTDIR; without it, it deletes var\ and srv\). Since RemoveExistingProducts in the new MSI triggers the old product's full uninstall sequence, cached MSI sources and user data under var\cache were destroyed mid-upgrade, causing Error 1603 and breaking the test_salt_upgrade sentinel check. Two complementary fixes: Backward-compat (3006.25 → this version): Add two deferred CAs to the new MSI — PreserveRootDirVarCache_DECAC moves ROOTDIR\var\cache to a temp location before RemoveExistingProducts runs; RestoreRootDirVarCache_DECAC moves it back after CreateFolders re-creates the directory skeleton. The old MSI can no longer reach the cache dir. Forward-compat (this version → future): Change the DeleteConfig_CADH/DECAC condition from `REMOVE ~= "ALL"` to `(REMOVE ~= "ALL") AND NOT UPGRADINGPRODUCTCODE` so the action is never even scheduled during a MajorUpgrade. Move the existing C# UPGRADINGPRODUCTCODE guard to be the first check in DeleteConfig_DECAC (before CLEAN_INSTALL and bytecode cache deletion) as defense-in-depth.
UTF-16 and UTF-32 files contain null bytes which cause is_text() to classify them as binary, making file.replace raise SaltInvocationError. Add an encoding parameter to both the execution module and state. When set, the binary check is bypassed and the file is read/written in text mode with the specified encoding, preserving line endings and the BOM. Fixes #52793
Document the correct conditions under which a password is required when using runas on Windows: only when the salt-minion is not running as SYSTEM or as an elevated Administrator. Remove the inaccurate claim that the target user account must be in the Administrators group. Add password=None as an explicit parameter to cmd.run state so it is clearly documented and reliably forwarded to cmd.run_all. Update all runas/password docstrings across salt.modules.cmdmod (run, shell, run_stdout, run_stderr, run_all, retcode, script, script_retcode, run_chroot, powershell, powershell_all, run_bg). Replace the hard fail in cmd.script when runas is given without a password on Windows with a log.warning, since the operation succeeds without a password when Salt has sufficient privileges. Update and add unit tests to cover the warning behavior and password forwarding. Fixes #57951
In hot/hot multimaster, one Minion per master shares a cachedir. The shared minion_queue.lock and queue dirs caused two bugs: Minion.__init__'s stale-lock cleanup deleted a sibling Minion's live lock (FileNotFoundError on release), and either Minion could drain any queued file (wrong-master returns). Move all queue state under cachedir/queues/<sanitized-master>/ via helpers in salt.utils.state. Each Minion only touches its own subtree. No migration: jobs queued under legacy paths at upgrade time time out on the originating master.
The cached pyVmomi service instance health check in get_service_instance only reconnected on vim.fault.NotAuthenticated. When the cached connection is corrupted at the socket level - notably when salt-cloud's map_providers_parallel inherits the cached service instance across an os.fork() and the shared TLS socket is used from more than one process - CurrentTime() raises ssl.SSLError, BrokenPipeError or ConnectionResetError instead. Those propagated to the caller and failed the operation, which is the SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC seen on alternating cloud.present state IDs in issue #61983. Treat those socket-level errors the same as an expired session: drop the dead connection and reconnect. Disconnect is guarded because logging out over an already broken socket can raise as well.
get_master_ip forwarded its arguments to _connect positionally:
server = _connect(host, port, password)
but _connect's positional signature is (host, port, db, password), so
the caller-supplied password landed in the db slot, while the actual
password parameter of _connect fell through to
config.option("redis.password"). When the configured Redis required
a password, this produced "ERR invalid DB index"; when the server
accepted unauthenticated connections, the password was silently
ignored - a security concern because the operator has no signal that
their authentication never reached the server.
Pass the arguments to _connect by keyword.
Adds six regression tests in tests/pytests/unit/modules/test_redismod.py;
five of them fail before this fix.
Refs: #69029
salt.modules.redismod._connect used the truthy check ``if not db`` to
decide whether to fall back to ``config.option("redis.db")``. Since
``not 0`` is True, a caller that explicitly passed ``db=0`` -- the
default Redis database index, and a perfectly valid value -- had the
argument silently replaced by the configured value.
This is inconsistent with the sibling helper ``_sconnect`` in the
same module, which already uses ``is None`` for the same purpose.
Use ``if db is None`` so that only the absent-argument case triggers
the fall-back. The other arguments (host, port, password) keep their
truthy-check semantics intentionally: empty string and 0 are not
legitimate values for a hostname or port, and "" is not a meaningful
password override.
Adds four regression tests in tests/pytests/unit/modules/test_redismod.py:
the headline ``test_connect_passes_db_zero_through`` fails before this
fix; the other three pin scope and default behaviour.
Refs: #69030
… AUTH Two distinct bugs prevented the salt.engines.redis_sentinel engine from being usable on any modern Salt installation: 1. ``start()`` raised ``AttributeError: 'dict_values' object has no attribute 'pop'`` on its very first call. ``dict.values()`` returns a Python 3 ``dict_values`` view that has no ``.pop()`` method, so ``local.cmd(...).values().pop()`` crashed before the Listener was ever constructed. The engine has therefore never actually run on Python 3. The fix wraps the result in ``list(...)`` so ``.pop()`` is available. 2. ``Listener`` built the redis client without ever passing a password, so the engine could not authenticate against a Sentinel that required AUTH. The Sentinel ``psubscribe`` then failed with ``NOAUTH Authentication required`` and the engine silently did nothing. The fix adds an optional ``password`` parameter to ``Listener.__init__`` and ``start()``, defaulting to ``None`` for backwards compatibility, and forwards it to ``redis.StrictRedis``. Operators opt in by adding ``password: '...'`` under the engine's YAML config; Salt's engine loader forwards YAML keys to ``start()`` as kwargs. This commit covers the engine's pubsub-monitoring use case (subscribing to the ``+switch-master`` / ``+odown`` / ``-odown`` admin channels of a single Sentinel node). Because the engine listens to broadcast channels, no ``service_name`` parameter is required at this level -- the master name is part of each event's payload. A proper Sentinel client with multi-host discovery and ``service_name`` is a larger piece of work scheduled for a follow-up commit in salt/modules/redismod.py. Adds five regression tests in tests/pytests/unit/engines/test_redis_sentinel.py (new file). Four of them fail before this fix. Refs: #69031
The salt.returners.redis_return returner reads host, port, db,
unix_socket_path, cluster_mode and the cluster startup-nodes from
config, but did not read or use redis.password -- even though the
docstring at the top of the module documents it as a configuration
option. Operators who set ``redis.password: 'secret'`` in master
config got ``NOAUTH Authentication required`` from Redis and the
returner silently lost every job return; jobs ran, returns were
produced, but the returner could not persist them, with no signal
in master logs that the documented option was ignored.
Three changes in salt/returners/redis_return.py, all in
``_get_options`` and ``_get_serv``:
- Add ``"password": "password"`` to the ``attrs`` mapping so the
shared ``salt.returners.get_returner_options`` helper surfaces
the configured value.
- Add ``"password": __opts__.get("redis.password", None)`` to the
proxy-mode hand-built return dict.
- Pass ``password=_options.get("password")`` to both the
single-server (``redis.StrictRedis``) and cluster
(``StrictRedisCluster``) constructors.
Deployments without a password are unaffected: the resolved value
is ``None`` and the redis client constructors handle that as the
"no AUTH" path.
Adds seven regression tests in
tests/pytests/unit/returners/test_redis_return.py (new file). Five
of them fail before this fix; two pin the unauthenticated default
to guarantee backwards compatibility.
Refs: #69032
Three closely-related bugs in salt.cache.redis_cache together broke
hierarchical-bank semantics for any deployment that uses the redis
cache backend, most visibly causing salt-run manage.present and
salt-run manage.up to return empty:
1. _build_bank_hier never registered child bank names in the
parent's index sets, only a "." placeholder. So $BANKEYS_minions
stayed empty even when minions/<id>/data was stored correctly,
and cache.list("minions") returned []. The fix registers each
non-root segment in both the parent's $BANK_ set (consumed by
_get_banks_to_remove during flush) and the parent's $BANKEYS_
set (consumed by list_).
2. _get_banks_to_remove did not decode the bytes returned by
smembers (the redis client is not configured with
decode_responses=True). Today the bug is masked because the parent
set always contained only b"."; once bug 1 is fixed and real
child names appear, every recursive step would build a corrupted
path like "minions/b'foo'" and silently fail to descend. The fix
decodes child names and skips the "." placeholder explicitly.
3. flush(bank, key=None) of a sub-bank deleted the bank's own data
and indices but never removed the bank's own reference from its
parent's index sets. After bug 1 is fixed those parent sets
contain real names, so cache.list(parent) would still report a
freshly-flushed bank as present. The fix issues SREM against
both parent index sets when the flushed bank has a parent.
Also removes the now-unused ``import itertools`` left over from the
old _build_bank_hier implementation.
Existing data stored before this commit will not magically appear in
the new parent indices; in normal operation the indices repopulate
within minutes (every minion auth, every mine.update, every
pillar.refresh triggers a fresh store("minions/<id>", "data", ...)).
Adds 13 regression tests in tests/pytests/unit/cache/test_redis_cache.py
(new file) covering each of the three fixes plus two end-to-end
store -> list -> flush(child) -> list integration scenarios. Nine of
the tests fail before this commit; the remaining four pin the
backwards-compatible sanity paths.
Refs: #69033
…en reads
The cluster client was created with ``decode_responses=True``, which
caused both ``redis_client.get()`` and ``redis_client.keys()`` to
return ``str`` instead of ``bytes``. Two callers in the same module
were then broken:
- ``get_token`` deserialises with ``salt.payload.loads(...)``, which
calls ``msgpack.unpackb`` under the hood; ``msgpack`` requires
``bytes`` and raises ``TypeError`` for ``str``.
- ``list_tokens`` does ``[k.decode("utf8") for k in
redis_client.keys()]``; ``str`` has no ``.decode``.
Both errors were swallowed by broad ``except Exception`` handlers
that returned ``{}`` / ``[]`` and merely logged a warning. From the
outside, eauth-via-cluster looked like it was rejecting every
authenticated session for unknown reasons; in fact the read path
inside Salt was returning empty results before the cluster was ever
consulted.
The fix removes ``decode_responses=True`` from ``_redis_client``.
Values now round-trip as bytes through msgpack as the rest of the
module already expected (``mk_token`` always wrote
``salt.payload.dumps(tdata)`` bytes; only the read path was broken).
Adds six regression tests in tests/pytests/unit/tokens/test_rediscluster.py
(new file). The headline test asserts ``decode_responses=True`` is no
longer passed to ``StrictRedisCluster``; the remaining tests pin the
post-fix read-path contract (get_token round-trip, list_tokens
returns ``str`` keys, mk_token + get_token round-trip).
Refs: #69035
…d_jobs The salt.returners.redis_return returner used Redis's ``KEYS pattern`` in two hot paths: - ``get_jids`` issued ``KEYS load:*`` on every ``salt-run jobs.list_jobs`` and on every UI listing of jobs. - ``clean_old_jobs`` issued ``KEYS ret:*`` and ``KEYS load:*`` from the master scheduler, every few minutes per ``keep_jobs_seconds``. Per the Redis documentation, ``KEYS`` is intended only for "debugging and special operations". It walks the entire keyspace synchronously and blocks the server thread for the duration; on a master with hundreds of thousands of jobs (each producing ``ret:<jid>``, ``load:<jid>`` and ``<minion>:<fun>`` keys), one ``KEYS`` call can stall the Redis server -- and therefore every other client of that Redis instance, not just Salt -- for seconds. Replace every ``serv.keys(pattern)`` call with ``serv.scan_iter(match=pattern)``. ``SCAN`` walks the keyspace incrementally in chunks and never blocks the server. Notes: - ``SCAN`` does not guarantee insertion order. The returner does not rely on order in either call site (``get_jids`` uses a dict keyed by jid; ``clean_old_jobs`` uses set membership), so this is a non-issue for Salt itself. Operators with custom scripts that read these keys directly may see them in a different order. - ``SCAN`` may return the same key multiple times under heavy concurrent writes. Both call sites accumulate keys into either a ``set`` or a list passed to ``mget``/``delete``; duplicates are harmless (set membership ignores them; redundant deletes are no-ops). - An empty-keyspace guard is added to ``get_jids`` to skip the ``mget`` call when no ``load:*`` keys exist; some Redis clients reject ``mget()`` with no arguments. This is a ``.changed`` rather than ``.fixed`` because it alters the observable load characteristics of the Redis server even though no Salt-visible API changes. Adds five regression tests in tests/pytests/unit/returners/test_redis_return.py that all fail before this commit. The mocked ``serv.keys`` raises ``AssertionError`` so any future regression of the form "fall back to KEYS" is caught at test time. Refs: #69037
… TTL The redis_return returner writes four kinds of keys for every job return: ``ret:<jid>`` (hash), ``<minion>:<fun>`` (string), ``minions`` (set), and applies an explicit ``EXPIRE`` to ``ret:<jid>``. The ``<minion>:<fun>`` last-jid pointer was written with ``pipeline.set(...)`` and no ``ex=`` argument, so it never expired and was never explicitly deleted anywhere in the module. Any (minion, fun) pair that stopped running stuck in Redis indefinitely; for a deployment with O(1k) minions and O(50) distinct functions in the job ladder, the master leaked O(50k) string keys over its lifetime. Pass ``ex=_get_ttl()`` to the ``pipeline.set(...)`` call so the pointer expires on the same schedule (``keep_jobs_seconds``) as the ``ret:<jid>`` hash. ``redis-py`` accepts the ``ex=`` kwarg as part of an atomic SET, so this adds no round-trips. Operators with external scripts that read ``<minion>:<fun>`` keys directly may observe them expiring; the documentation never promised they would not. Adds two regression tests in tests/pytests/unit/returners/test_redis_return.py. The headline ``test_returner_sets_ttl_on_minion_fun_pointer`` fails before this commit; the second sanity test pins that the four canonical pipeline operations are still issued. Refs: #69038
salt.returners.redis_return.get_fun(fun) is documented as returning
"a dict of the last function called for all minions". The
implementation read return data from a ``<minion>:<jid>`` key that
no other code in the module ever writes -- a leftover from an older
storage schema. The function therefore always returned ``{}``
regardless of how much data was actually in Redis.
The current ``returner`` writes return data as fields of a
``ret:<jid>`` hash:
pipeline.hset(f"ret:{jid}", minion, salt.utils.json.dumps(ret))
and ``get_jid(jid)`` already reads from that hash via ``hgetall``.
``get_fun`` now reads the per-minion field via
``serv.hget(f"ret:{jid}", minion)``, matching the canonical storage
layout.
Adds four regression tests in tests/pytests/unit/returners/test_redis_return.py.
The headline ``test_get_fun_reads_data_from_ret_hash`` fails before
this commit; the missing-field edge case also fails before. Two
sanity tests pin behaviour for the empty-pointer and empty-minions
paths, which were already correct.
Refs: #69039
…match Apply review feedback from bdrx312 on PR #69040: replace `list(d.values()).pop()` with `next(reversed(d.values()))`. On a `dict_values` view `reversed()` is O(1) and avoids the temporary list allocation, while preserving the previous "last minion's first IP" semantic. While here, also handle the `local.cmd(...)` empty-result case. The previous code raised `IndexError` from `.pop()` when the `matching` target hit zero minions, giving the operator no hint that the real problem was a misconfigured target pattern. The engine now logs the matching pattern at ERROR and returns cleanly. The upstream "no .pop on dict_values" comment block is no longer relevant and is removed; the new code does not need the workaround. Behavioural test added for the empty-match path: `test_start_logs_and_returns_when_no_minions_match`. The four existing tests still pass unchanged, including the original `test_start_does_not_crash_on_dict_values_pop` regression guard.
When ``salt.utils.timed_subprocess.TimedProc`` raises ``OSError``
(typically ``ENOENT`` when the binary does not exist, but any
spawn-time errno triggers it) ``salt.modules.cmdmod._run`` builds a
``CommandExecutionError`` whose message is meant to help the operator
debug. Previously the entire ``new_kwargs`` dict was interpolated
into that message, leaking two routinely-secret-bearing fields:
* ``env`` -- the run environment, set by callers via
``cmd.run env={'DB_PASSWORD': '...'}`` or by states that pass
credentials through env vars.
* ``stdin`` -- the bytes piped to the command, commonly used to
feed a password to a CLI like ``mysql -p``.
Both leak channels matter: the resulting ``CommandExecutionError``
ends up in master/minion logs *and* in event-bus return data
visible to the API caller. ENOENT is not a rare condition; it
fires on any typo in a binary path. A typo should not exfiltrate
credentials.
Filter ``env`` and ``stdin`` out of the dict before formatting.
The remaining ``cwd``, ``shell``, ``executable``, ``timeout`` etc.
are still useful debugging context and do not contain user-supplied
secrets.
Two behavioural tests guard each leak channel; both fail on the
previous implementation.
Refs: #69075
All public functions in salt/modules/win_pkg.py ignored the minion's configured saltenv, hardcoding "base" as the fallback instead of reading from __opts__["saltenv"]. Affected functions: refresh_db, genrepo, install, remove, list_pkgs, latest_version, upgrade_available, list_upgrades, list_available, version, get_repo_data, get_package_info. Fixes #38551
The state-queue refactor in commit a21bc25 (v3006.21) conflated "the queued payload needs a guaranteed-unique JID" with "the queued payload needs its own JID", and replaced the master's __pub_jid with a freshly-minted one in both the on-disk payload and the queue filename. A drain-side rewrite masked the symptom until commit e002fbc (v3006.24) reverted the queue filename to also use the new JID, removing the mask. A minion must never change a job's JID once one has been assigned. Job-tracking infrastructure (returners, the jobs runner, syndic forwarding) keys on the JID the master published; if the minion executes the queued run under a different JID, the master's view of the run never resolves. Only mint a new JID when none is provided -- that is the salt-call / local case, where the minion is both publisher and executor and no master-side tracking is involved. Filename uniqueness is already provided by the microsecond-precision timestamp prefix, so we do not need a fresh JID for that either. Updates the existing _check_queue unit test that asserted the buggy new-JID behavior and adds two focused regression tests covering both the master path and the salt-call path. Fixes #69386
The #69386 fix targets the state-queue write path (the only place that actually mints a fresh JID for a master-published payload). The job-queue write path in salt.minion.Minion._queue_job already dumps the payload by reference without touching jid, so it is not affected -- but nothing pins that invariant down. Add a focused unit test that: - patches salt.utils.jid.gen_jid to raise (so any future regression that starts minting a new JID here fails loudly), and - asserts both the serialized payload's jid and the on-disk queue filename embed the original master JID. Same regression class as #69386, same fix philosophy: the minion never changes a job's JID once one has been assigned.
Per reviewer request: pair the unit-level _check_queue assertion with an end-to-end integration test that exercises the real minion's state-queue write path. The test fires a slow state job (--async) to occupy the state slot, then publishes a second state job with queue=True (also --async). While the slow job is still running, it reads the file the minion wrote into cachedir/queues/<master>/state_queue/ and asserts that both the JID embedded in the filename and the jid field inside the msgpack-serialized payload equal the master-published JID for the second job. Under the pre-fix code path both would be a freshly- minted JID; the assertion message points at #69386 to keep the intent obvious in future failures. Verified: passes on this branch; fails on origin/3006.x with the expected assertion message ("queued payload jid ... does not match the master-published jid ... -- minion re-stamped the JID").
twangboy
approved these changes
Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.