Skip to content

Merge forward 3006.x into 3007.x#69401

Merged
dwoz merged 45 commits into
3007.xfrom
merge/3006.x/3007.x/25-06-08
Jun 8, 2026
Merged

Merge forward 3006.x into 3007.x#69401
dwoz merged 45 commits into
3007.xfrom
merge/3006.x/3007.x/25-06-08

Conversation

@dwoz

@dwoz dwoz commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

No description provided.

Daniel A. Wozniak and others added 30 commits May 27, 2026 17:43
doc/conf.py hard-coded "version_match": "master", so every published
build (including the 3008.0 release at /en/latest/) emitted
DOCUMENTATION_OPTIONS.theme_switcher_version_match = 'master' and the
dropdown always highlighted "Master (Dev)" as the current version.

Make version_match dynamic so it matches a versions.json entry by
BUILD_TYPE:
  - master / next       -> "master"
  - latest              -> "latest"
  - previous / other    -> release.split(".", 1)[0]  (e.g. "3006")

Also fix the dropdown URLs in doc/_static/versions.json. The 3006.24
entry pointed at /en/3006.24/ which returns 404; the served URL pattern
is /en/<major>/ (e.g. /en/3006/, /en/3008/). Rewrite the entries to use
major-only paths and add a 3008 entry, since the latest release at
docs.saltproject.io is now 3008.0.

Note: doc/conf.py points every build at the absolute json_url
https://docs.saltproject.io/en/latest/_static/versions.json, so the
served versions.json is whichever branch publishes /en/latest/. Updating
versions.json here is harmless on 3006.x but only takes effect site-wide
when the same change reaches the branch that builds /en/latest/.

Verified locally with the CI invocation under a python3.14 venv built
from requirements/static/ci/py3.14/docs.lock:
  cd doc && make clean && make html SPHINXOPTS='-W -j auto --keep-going --color'
  cd doc && make clean && BUILD_TYPE=latest make html SPHINXOPTS='-W -j auto --keep-going --color'
Rendered contents.html emits:
  theme_switcher_version_match = 'master'   (default)
  theme_switcher_version_match = 'latest'   (BUILD_TYPE=latest)
pkg.list_patches in yumpkg.py  parses tdnf output on Photon OS
Fix docs version switcher selecting the wrong active version
After the requirements .txt -> .in rename, MANIFEST.in still only
included requirements/*.txt, so the PyPI sdist no longer shipped the
files setup.py reads to populate install_requires. The build backend
silently returned an empty list, and `pip install salt` from a sdist
built off this branch would install salt with zero dependencies.

Add *.in and *.lock to the recursive-include so the dependency manifests
(and the locked variants used when USE_STATIC_REQUIREMENTS=1) end up in
the sdist again. Backport of the 3008.x fix; changelog lives on the
forward branch.
After the requirements .txt -> .in rename, MANIFEST.in still only
included requirements/*.txt, so the PyPI sdist no longer shipped the
files setup.py reads to populate install_requires. The build backend
silently returned an empty list, and `pip install salt` from a sdist
built off this branch would install salt with zero dependencies.

Add *.in and *.lock to the recursive-include so the dependency manifests
(and the locked variants used when USE_STATIC_REQUIREMENTS=1) end up in
the sdist again. Backport of the 3008.x fix; changelog lives on the
forward branch.
After the requirements .txt -> .in rename, MANIFEST.in still only
included requirements/*.txt, so the PyPI sdist no longer shipped the
files setup.py reads to populate install_requires. The build backend
silently returned an empty list, and `pip install salt` from a sdist
built off this branch would install salt with zero dependencies.

Add *.in and *.lock to the recursive-include so the dependency manifests
(and the locked variants used when USE_STATIC_REQUIREMENTS=1) end up in
the sdist again. Backport of the 3008.x fix; changelog lives on the
forward branch.

# Conflicts:
#	requirements/static/ci/py3.10/cloud.lock
#	requirements/static/ci/py3.10/darwin.lock
#	requirements/static/ci/py3.10/freebsd.lock
#	requirements/static/ci/py3.10/lint.lock
#	requirements/static/ci/py3.10/linux.lock
#	requirements/static/ci/py3.10/windows.lock
#	requirements/static/ci/py3.11/cloud.lock
#	requirements/static/ci/py3.11/darwin.lock
#	requirements/static/ci/py3.11/freebsd.lock
#	requirements/static/ci/py3.11/lint.lock
#	requirements/static/ci/py3.11/linux.lock
#	requirements/static/ci/py3.11/windows.lock
#	requirements/static/ci/py3.12/cloud.lock
#	requirements/static/ci/py3.12/darwin.lock
#	requirements/static/ci/py3.12/freebsd.lock
#	requirements/static/ci/py3.12/lint.lock
#	requirements/static/ci/py3.12/linux.lock
#	requirements/static/ci/py3.12/windows.lock
#	requirements/static/ci/py3.13/cloud.lock
#	requirements/static/ci/py3.13/darwin.lock
#	requirements/static/ci/py3.13/freebsd.lock
#	requirements/static/ci/py3.13/lint.lock
#	requirements/static/ci/py3.13/linux.lock
#	requirements/static/ci/py3.13/windows.lock
#	requirements/static/ci/py3.14/cloud.lock
#	requirements/static/ci/py3.14/darwin.lock
#	requirements/static/ci/py3.14/freebsd.lock
#	requirements/static/ci/py3.14/lint.lock
#	requirements/static/ci/py3.14/linux.lock
#	requirements/static/ci/py3.14/windows.lock
#	requirements/static/ci/py3.9/cloud.lock
#	requirements/static/ci/py3.9/darwin.lock
#	requirements/static/ci/py3.9/freebsd.lock
#	requirements/static/ci/py3.9/lint.lock
#	requirements/static/ci/py3.9/linux.lock
#	requirements/static/ci/py3.9/windows.lock
)

When the master returns a bare string error ("bad load",
"Some exception handling minion payload",
"Server-side exception handling payload") for a pillar request, the
minion's crypted_transfer_decode_dictentry crashed with
"TypeError: string indices must be integers" at ret["key"]. The "key"
not in ret guard accidentally passed for strings (substring check), the
post-reauth retry returned the same string, and the indexing blew up.

Raise a clean AuthenticationError when the post-reauth response is still
not a dict containing "key", so the caller can fail or retry instead of
crashing. Matches the cluster_retry-decorated guard already on 3008.x /
master (commit 2efc32e).

Add a parametrized regression test covering all three server-side string
error returns.
Fixes #68181

The minion was signing before AsyncReqChannel attached nonce/ts/tok,
so master verification re-serialized different bytes and every signed
return was dropped. Sign inside _package_load after the metadata is
attached; remove the now-redundant signing from the five _send_req_*
methods.
DeleteConfig_DECAC unconditionally deleted ROOTDIR\var during
RemoveExistingProducts, destroying the MSI that pkg.install had cached
to ROOTDIR\var\cache before launching the upgrade. Users with
REMOVE_CONFIG=1 persisted in the registry hit a worse variant where
the entire ROOTDIR was deleted.
Fix: check UPGRADINGPRODUCTCODE in DeleteConfig_DECAC — set by Windows
Installer whenever RemoveExistingProducts fires — and skip all ROOTDIR
deletion during upgrades. Also remove the default var/srv deletion on
standalone uninstall, aligning the MSI with the NSIS installer which
has always preserved ROOTDIR during upgrades.
Forward UPGRADINGPRODUCTCODE via the CADH entry in Product.wxs and add
upgrade tests that plant a sentinel in ROOTDIR\var\cache (with and
without REMOVE_CONFIG=1 in the registry) to catch regressions.
Fixes #69219
The old 3006.25 MSI always deletes ROOTDIR\var unconditionally in its
DeleteConfig_DECAC (with REMOVE_CONFIG it deletes the entire ROOTDIR;
without it, it deletes var\ and srv\). Since RemoveExistingProducts in
the new MSI triggers the old product's full uninstall sequence, cached
MSI sources and user data under var\cache were destroyed mid-upgrade,
causing Error 1603 and breaking the test_salt_upgrade sentinel check.
Two complementary fixes:
Backward-compat (3006.25 → this version): Add two deferred CAs to the
new MSI — PreserveRootDirVarCache_DECAC moves ROOTDIR\var\cache to a
temp location before RemoveExistingProducts runs; RestoreRootDirVarCache_DECAC
moves it back after CreateFolders re-creates the directory skeleton.
The old MSI can no longer reach the cache dir.
Forward-compat (this version → future): Change the DeleteConfig_CADH/DECAC
condition from `REMOVE ~= "ALL"` to `(REMOVE ~= "ALL") AND NOT
UPGRADINGPRODUCTCODE` so the action is never even scheduled during a
MajorUpgrade. Move the existing C# UPGRADINGPRODUCTCODE guard to be the
first check in DeleteConfig_DECAC (before CLEAN_INSTALL and bytecode
cache deletion) as defense-in-depth.
UTF-16 and UTF-32 files contain null bytes which cause is_text() to
classify them as binary, making file.replace raise SaltInvocationError.
Add an encoding parameter to both the execution module and state. When
set, the binary check is bypassed and the file is read/written in text
mode with the specified encoding, preserving line endings and the BOM.

Fixes #52793
Document the correct conditions under which a password is required when
using runas on Windows: only when the salt-minion is not running as
SYSTEM or as an elevated Administrator. Remove the inaccurate claim that
the target user account must be in the Administrators group.
Add password=None as an explicit parameter to cmd.run state so it is
clearly documented and reliably forwarded to cmd.run_all. Update all
runas/password docstrings across salt.modules.cmdmod (run, shell,
run_stdout, run_stderr, run_all, retcode, script, script_retcode,
run_chroot, powershell, powershell_all, run_bg).
Replace the hard fail in cmd.script when runas is given without a
password on Windows with a log.warning, since the operation succeeds
without a password when Salt has sufficient privileges.
Update and add unit tests to cover the warning behavior and password
forwarding.

Fixes #57951
In hot/hot multimaster, one Minion per master shares a cachedir. The
shared minion_queue.lock and queue dirs caused two bugs: Minion.__init__'s
stale-lock cleanup deleted a sibling Minion's live lock (FileNotFoundError
on release), and either Minion could drain any queued file (wrong-master
returns).

Move all queue state under cachedir/queues/<sanitized-master>/ via helpers
in salt.utils.state. Each Minion only touches its own subtree.

No migration: jobs queued under legacy paths at upgrade time time out on
 the originating master.
The cached pyVmomi service instance health check in get_service_instance
only reconnected on vim.fault.NotAuthenticated. When the cached connection
is corrupted at the socket level - notably when salt-cloud's
map_providers_parallel inherits the cached service instance across an
os.fork() and the shared TLS socket is used from more than one process -
CurrentTime() raises ssl.SSLError, BrokenPipeError or ConnectionResetError
instead. Those propagated to the caller and failed the operation, which is
the SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC seen on alternating
cloud.present state IDs in issue #61983.

Treat those socket-level errors the same as an expired session: drop the
dead connection and reconnect. Disconnect is guarded because logging out
over an already broken socket can raise as well.
get_master_ip forwarded its arguments to _connect positionally:

    server = _connect(host, port, password)

but _connect's positional signature is (host, port, db, password), so
the caller-supplied password landed in the db slot, while the actual
password parameter of _connect fell through to
config.option("redis.password"). When the configured Redis required
a password, this produced "ERR invalid DB index"; when the server
accepted unauthenticated connections, the password was silently
ignored - a security concern because the operator has no signal that
their authentication never reached the server.

Pass the arguments to _connect by keyword.

Adds six regression tests in tests/pytests/unit/modules/test_redismod.py;
five of them fail before this fix.

Refs: #69029
salt.modules.redismod._connect used the truthy check ``if not db`` to
decide whether to fall back to ``config.option("redis.db")``. Since
``not 0`` is True, a caller that explicitly passed ``db=0`` -- the
default Redis database index, and a perfectly valid value -- had the
argument silently replaced by the configured value.

This is inconsistent with the sibling helper ``_sconnect`` in the
same module, which already uses ``is None`` for the same purpose.

Use ``if db is None`` so that only the absent-argument case triggers
the fall-back. The other arguments (host, port, password) keep their
truthy-check semantics intentionally: empty string and 0 are not
legitimate values for a hostname or port, and "" is not a meaningful
password override.

Adds four regression tests in tests/pytests/unit/modules/test_redismod.py:
the headline ``test_connect_passes_db_zero_through`` fails before this
fix; the other three pin scope and default behaviour.

Refs: #69030
co-cy and others added 15 commits June 6, 2026 17:47
… AUTH

Two distinct bugs prevented the salt.engines.redis_sentinel engine from
being usable on any modern Salt installation:

1. ``start()`` raised ``AttributeError: 'dict_values' object has no
   attribute 'pop'`` on its very first call. ``dict.values()`` returns
   a Python 3 ``dict_values`` view that has no ``.pop()`` method, so
   ``local.cmd(...).values().pop()`` crashed before the Listener was
   ever constructed. The engine has therefore never actually run on
   Python 3. The fix wraps the result in ``list(...)`` so ``.pop()``
   is available.

2. ``Listener`` built the redis client without ever passing a password,
   so the engine could not authenticate against a Sentinel that
   required AUTH. The Sentinel ``psubscribe`` then failed with
   ``NOAUTH Authentication required`` and the engine silently did
   nothing. The fix adds an optional ``password`` parameter to
   ``Listener.__init__`` and ``start()``, defaulting to ``None`` for
   backwards compatibility, and forwards it to ``redis.StrictRedis``.
   Operators opt in by adding ``password: '...'`` under the engine's
   YAML config; Salt's engine loader forwards YAML keys to ``start()``
   as kwargs.

This commit covers the engine's pubsub-monitoring use case (subscribing
to the ``+switch-master`` / ``+odown`` / ``-odown`` admin channels of
a single Sentinel node). Because the engine listens to broadcast
channels, no ``service_name`` parameter is required at this level --
the master name is part of each event's payload. A proper Sentinel
client with multi-host discovery and ``service_name`` is a larger
piece of work scheduled for a follow-up commit in
salt/modules/redismod.py.

Adds five regression tests in tests/pytests/unit/engines/test_redis_sentinel.py
(new file). Four of them fail before this fix.

Refs: #69031
The salt.returners.redis_return returner reads host, port, db,
unix_socket_path, cluster_mode and the cluster startup-nodes from
config, but did not read or use redis.password -- even though the
docstring at the top of the module documents it as a configuration
option. Operators who set ``redis.password: 'secret'`` in master
config got ``NOAUTH Authentication required`` from Redis and the
returner silently lost every job return; jobs ran, returns were
produced, but the returner could not persist them, with no signal
in master logs that the documented option was ignored.

Three changes in salt/returners/redis_return.py, all in
``_get_options`` and ``_get_serv``:

- Add ``"password": "password"`` to the ``attrs`` mapping so the
  shared ``salt.returners.get_returner_options`` helper surfaces
  the configured value.
- Add ``"password": __opts__.get("redis.password", None)`` to the
  proxy-mode hand-built return dict.
- Pass ``password=_options.get("password")`` to both the
  single-server (``redis.StrictRedis``) and cluster
  (``StrictRedisCluster``) constructors.

Deployments without a password are unaffected: the resolved value
is ``None`` and the redis client constructors handle that as the
"no AUTH" path.

Adds seven regression tests in
tests/pytests/unit/returners/test_redis_return.py (new file). Five
of them fail before this fix; two pin the unauthenticated default
to guarantee backwards compatibility.

Refs: #69032
Three closely-related bugs in salt.cache.redis_cache together broke
hierarchical-bank semantics for any deployment that uses the redis
cache backend, most visibly causing salt-run manage.present and
salt-run manage.up to return empty:

1. _build_bank_hier never registered child bank names in the
   parent's index sets, only a "." placeholder. So $BANKEYS_minions
   stayed empty even when minions/<id>/data was stored correctly,
   and cache.list("minions") returned []. The fix registers each
   non-root segment in both the parent's $BANK_ set (consumed by
   _get_banks_to_remove during flush) and the parent's $BANKEYS_
   set (consumed by list_).

2. _get_banks_to_remove did not decode the bytes returned by
   smembers (the redis client is not configured with
   decode_responses=True). Today the bug is masked because the parent
   set always contained only b"."; once bug 1 is fixed and real
   child names appear, every recursive step would build a corrupted
   path like "minions/b'foo'" and silently fail to descend. The fix
   decodes child names and skips the "." placeholder explicitly.

3. flush(bank, key=None) of a sub-bank deleted the bank's own data
   and indices but never removed the bank's own reference from its
   parent's index sets. After bug 1 is fixed those parent sets
   contain real names, so cache.list(parent) would still report a
   freshly-flushed bank as present. The fix issues SREM against
   both parent index sets when the flushed bank has a parent.

Also removes the now-unused ``import itertools`` left over from the
old _build_bank_hier implementation.

Existing data stored before this commit will not magically appear in
the new parent indices; in normal operation the indices repopulate
within minutes (every minion auth, every mine.update, every
pillar.refresh triggers a fresh store("minions/<id>", "data", ...)).

Adds 13 regression tests in tests/pytests/unit/cache/test_redis_cache.py
(new file) covering each of the three fixes plus two end-to-end
store -> list -> flush(child) -> list integration scenarios. Nine of
the tests fail before this commit; the remaining four pin the
backwards-compatible sanity paths.

Refs: #69033
…en reads

The cluster client was created with ``decode_responses=True``, which
caused both ``redis_client.get()`` and ``redis_client.keys()`` to
return ``str`` instead of ``bytes``. Two callers in the same module
were then broken:

- ``get_token`` deserialises with ``salt.payload.loads(...)``, which
  calls ``msgpack.unpackb`` under the hood; ``msgpack`` requires
  ``bytes`` and raises ``TypeError`` for ``str``.
- ``list_tokens`` does ``[k.decode("utf8") for k in
  redis_client.keys()]``; ``str`` has no ``.decode``.

Both errors were swallowed by broad ``except Exception`` handlers
that returned ``{}`` / ``[]`` and merely logged a warning. From the
outside, eauth-via-cluster looked like it was rejecting every
authenticated session for unknown reasons; in fact the read path
inside Salt was returning empty results before the cluster was ever
consulted.

The fix removes ``decode_responses=True`` from ``_redis_client``.
Values now round-trip as bytes through msgpack as the rest of the
module already expected (``mk_token`` always wrote
``salt.payload.dumps(tdata)`` bytes; only the read path was broken).

Adds six regression tests in tests/pytests/unit/tokens/test_rediscluster.py
(new file). The headline test asserts ``decode_responses=True`` is no
longer passed to ``StrictRedisCluster``; the remaining tests pin the
post-fix read-path contract (get_token round-trip, list_tokens
returns ``str`` keys, mk_token + get_token round-trip).

Refs: #69035
…d_jobs

The salt.returners.redis_return returner used Redis's ``KEYS pattern``
in two hot paths:

- ``get_jids`` issued ``KEYS load:*`` on every ``salt-run jobs.list_jobs``
  and on every UI listing of jobs.
- ``clean_old_jobs`` issued ``KEYS ret:*`` and ``KEYS load:*`` from the
  master scheduler, every few minutes per ``keep_jobs_seconds``.

Per the Redis documentation, ``KEYS`` is intended only for "debugging
and special operations". It walks the entire keyspace synchronously
and blocks the server thread for the duration; on a master with
hundreds of thousands of jobs (each producing ``ret:<jid>``,
``load:<jid>`` and ``<minion>:<fun>`` keys), one ``KEYS`` call can
stall the Redis server -- and therefore every other client of that
Redis instance, not just Salt -- for seconds.

Replace every ``serv.keys(pattern)`` call with
``serv.scan_iter(match=pattern)``. ``SCAN`` walks the keyspace
incrementally in chunks and never blocks the server.

Notes:

- ``SCAN`` does not guarantee insertion order. The returner does not
  rely on order in either call site (``get_jids`` uses a dict keyed
  by jid; ``clean_old_jobs`` uses set membership), so this is a
  non-issue for Salt itself. Operators with custom scripts that read
  these keys directly may see them in a different order.
- ``SCAN`` may return the same key multiple times under heavy
  concurrent writes. Both call sites accumulate keys into either a
  ``set`` or a list passed to ``mget``/``delete``; duplicates are
  harmless (set membership ignores them; redundant deletes are
  no-ops).
- An empty-keyspace guard is added to ``get_jids`` to skip the
  ``mget`` call when no ``load:*`` keys exist; some Redis clients
  reject ``mget()`` with no arguments.

This is a ``.changed`` rather than ``.fixed`` because it alters the
observable load characteristics of the Redis server even though no
Salt-visible API changes.

Adds five regression tests in tests/pytests/unit/returners/test_redis_return.py
that all fail before this commit. The mocked ``serv.keys`` raises
``AssertionError`` so any future regression of the form "fall back
to KEYS" is caught at test time.

Refs: #69037
… TTL

The redis_return returner writes four kinds of keys for every job
return: ``ret:<jid>`` (hash), ``<minion>:<fun>`` (string),
``minions`` (set), and applies an explicit ``EXPIRE`` to
``ret:<jid>``. The ``<minion>:<fun>`` last-jid pointer was written
with ``pipeline.set(...)`` and no ``ex=`` argument, so it never
expired and was never explicitly deleted anywhere in the module.
Any (minion, fun) pair that stopped running stuck in Redis
indefinitely; for a deployment with O(1k) minions and O(50)
distinct functions in the job ladder, the master leaked O(50k)
string keys over its lifetime.

Pass ``ex=_get_ttl()`` to the ``pipeline.set(...)`` call so the
pointer expires on the same schedule (``keep_jobs_seconds``) as the
``ret:<jid>`` hash. ``redis-py`` accepts the ``ex=`` kwarg as part
of an atomic SET, so this adds no round-trips.

Operators with external scripts that read ``<minion>:<fun>`` keys
directly may observe them expiring; the documentation never
promised they would not.

Adds two regression tests in tests/pytests/unit/returners/test_redis_return.py.
The headline ``test_returner_sets_ttl_on_minion_fun_pointer`` fails
before this commit; the second sanity test pins that the four
canonical pipeline operations are still issued.

Refs: #69038
salt.returners.redis_return.get_fun(fun) is documented as returning
"a dict of the last function called for all minions". The
implementation read return data from a ``<minion>:<jid>`` key that
no other code in the module ever writes -- a leftover from an older
storage schema. The function therefore always returned ``{}``
regardless of how much data was actually in Redis.

The current ``returner`` writes return data as fields of a
``ret:<jid>`` hash:

    pipeline.hset(f"ret:{jid}", minion, salt.utils.json.dumps(ret))

and ``get_jid(jid)`` already reads from that hash via ``hgetall``.
``get_fun`` now reads the per-minion field via
``serv.hget(f"ret:{jid}", minion)``, matching the canonical storage
layout.

Adds four regression tests in tests/pytests/unit/returners/test_redis_return.py.
The headline ``test_get_fun_reads_data_from_ret_hash`` fails before
this commit; the missing-field edge case also fails before. Two
sanity tests pin behaviour for the empty-pointer and empty-minions
paths, which were already correct.

Refs: #69039
…match

Apply review feedback from bdrx312 on PR #69040: replace
`list(d.values()).pop()` with `next(reversed(d.values()))`. On a
`dict_values` view `reversed()` is O(1) and avoids the temporary
list allocation, while preserving the previous "last minion's first
IP" semantic.

While here, also handle the `local.cmd(...)` empty-result case. The
previous code raised `IndexError` from `.pop()` when the
`matching` target hit zero minions, giving the operator no hint
that the real problem was a misconfigured target pattern. The engine
now logs the matching pattern at ERROR and returns cleanly. The
upstream "no .pop on dict_values" comment block is no longer relevant
and is removed; the new code does not need the workaround.

Behavioural test added for the empty-match path:
`test_start_logs_and_returns_when_no_minions_match`. The four
existing tests still pass unchanged, including the original
`test_start_does_not_crash_on_dict_values_pop` regression guard.
When ``salt.utils.timed_subprocess.TimedProc`` raises ``OSError``
(typically ``ENOENT`` when the binary does not exist, but any
spawn-time errno triggers it) ``salt.modules.cmdmod._run`` builds a
``CommandExecutionError`` whose message is meant to help the operator
debug. Previously the entire ``new_kwargs`` dict was interpolated
into that message, leaking two routinely-secret-bearing fields:

* ``env`` -- the run environment, set by callers via
  ``cmd.run env={'DB_PASSWORD': '...'}`` or by states that pass
  credentials through env vars.
* ``stdin`` -- the bytes piped to the command, commonly used to
  feed a password to a CLI like ``mysql -p``.

Both leak channels matter: the resulting ``CommandExecutionError``
ends up in master/minion logs *and* in event-bus return data
visible to the API caller. ENOENT is not a rare condition; it
fires on any typo in a binary path. A typo should not exfiltrate
credentials.

Filter ``env`` and ``stdin`` out of the dict before formatting.
The remaining ``cwd``, ``shell``, ``executable``, ``timeout`` etc.
are still useful debugging context and do not contain user-supplied
secrets.

Two behavioural tests guard each leak channel; both fail on the
previous implementation.

Refs: #69075
All public functions in salt/modules/win_pkg.py ignored the minion's
configured saltenv, hardcoding "base" as the fallback instead of reading
from __opts__["saltenv"]. Affected functions: refresh_db, genrepo,
install, remove, list_pkgs, latest_version, upgrade_available,
list_upgrades, list_available, version, get_repo_data, get_package_info.

Fixes #38551
The state-queue refactor in commit a21bc25 (v3006.21) conflated
"the queued payload needs a guaranteed-unique JID" with "the queued
payload needs its own JID", and replaced the master's __pub_jid with
a freshly-minted one in both the on-disk payload and the queue
filename. A drain-side rewrite masked the symptom until commit
e002fbc (v3006.24) reverted the queue filename to also use the
new JID, removing the mask.

A minion must never change a job's JID once one has been assigned.
Job-tracking infrastructure (returners, the jobs runner, syndic
forwarding) keys on the JID the master published; if the minion
executes the queued run under a different JID, the master's view of
the run never resolves.

Only mint a new JID when none is provided -- that is the salt-call /
local case, where the minion is both publisher and executor and no
master-side tracking is involved. Filename uniqueness is already
provided by the microsecond-precision timestamp prefix, so we do not
need a fresh JID for that either.

Updates the existing _check_queue unit test that asserted the buggy
new-JID behavior and adds two focused regression tests covering both
the master path and the salt-call path.

Fixes #69386
The #69386 fix targets the state-queue write path (the only place that
actually mints a fresh JID for a master-published payload). The
job-queue write path in salt.minion.Minion._queue_job already dumps the
payload by reference without touching jid, so it is not affected -- but
nothing pins that invariant down.

Add a focused unit test that:

- patches salt.utils.jid.gen_jid to raise (so any future regression that
  starts minting a new JID here fails loudly), and
- asserts both the serialized payload's jid and the on-disk queue
  filename embed the original master JID.

Same regression class as #69386, same fix philosophy: the minion never
changes a job's JID once one has been assigned.
Per reviewer request: pair the unit-level _check_queue assertion with
an end-to-end integration test that exercises the real minion's
state-queue write path.

The test fires a slow state job (--async) to occupy the state slot,
then publishes a second state job with queue=True (also --async).
While the slow job is still running, it reads the file the minion
wrote into cachedir/queues/<master>/state_queue/ and asserts that
both the JID embedded in the filename and the jid field inside the
msgpack-serialized payload equal the master-published JID for the
second job. Under the pre-fix code path both would be a freshly-
minted JID; the assertion message points at #69386 to keep the
intent obvious in future failures.

Verified: passes on this branch; fails on origin/3006.x with the
expected assertion message ("queued payload jid ... does not match
the master-published jid ... -- minion re-stamped the JID").
@dwoz dwoz requested a review from a team as a code owner June 8, 2026 10:43
@dwoz dwoz added the test:full Run the full test suite label Jun 8, 2026
@dwoz dwoz changed the base branch from master to 3007.x June 8, 2026 10:44
@dwoz dwoz merged commit 9fb7574 into 3007.x Jun 8, 2026
1 check passed
@dwoz dwoz deleted the merge/3006.x/3007.x/25-06-08 branch June 8, 2026 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants