Add AnyFlow Any-Step Video Diffusion Pipelines (Bidirectional + FAR Causal) by Enderfga · Pull Request #13745 · huggingface/diffusers

Enderfga · 2026-05-14T03:39:15Z

What does this PR do?

This PR adds pipelines for AnyFlow (paper, project page, official code, model weights), an any-step video diffusion framework built on flow maps. A single distilled checkpoint can be evaluated at 1, 2, 4, 8, 16, 32 NFE without retraining, and quality scales monotonically with steps — unlike consistency-based distillation, which often degrades as NFE grows.

Two new pipelines are added, both on top of a new FlowMapEulerDiscreteScheduler and reusing WanLoraLoaderMixin:

AnyFlowPipeline → AnyFlowTransformer3DModel: bidirectional text-to-video built on the Wan2.1 backbone with an AnyFlowDualTimestepTextImageEmbedding conditioning on the source/target timestep pair (t, r).
AnyFlowFARPipeline → AnyFlowFARTransformer3DModel: frame-level autoregressive variant (block-sparse causal flex_attention + KV cache + compressed-frame patch embedding) jointly handling T2V / I2V / V2V through one context_sequence argument.

Four checkpoints are released under the nvidia/anyflow collection (Wan2.1-T2V-{1.3B,14B} bidi + FAR-Wan2.1-{1.3B,14B} causal). All four have been validated bit-exact against the official NVlabs/AnyFlow reference on H200: forward L2 = 0.00e+00 for scheduler / transformer / bidi pipeline / FAR pipeline; backward grad delta is 4.88e-04, attributable to bf16 kernel non-determinism only (PR-vs-PR = PR-vs-reference, ratio 1.000); inference latency matches the reference at ±0.0% on both pipelines.

T2V inference example:

import torch
from diffusers import AnyFlowPipeline
from diffusers.utils import export_to_video

pipe = AnyFlowPipeline.from_pretrained(
    "nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers", torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A red panda eating bamboo in a forest, cinematic lighting"
video = pipe(prompt, num_inference_steps=4, num_frames=33).frames[0]
export_to_video(video, "anyflow_t2v.mp4", fps=16)

I2V inference example with the FAR pipeline (single conditioning frame → autoregressive rollout):

import numpy as np
import torch
from diffusers import AnyFlowFARPipeline
from diffusers.utils import export_to_video, load_image

pipe = AnyFlowFARPipeline.from_pretrained(
    "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers", torch_dtype=torch.bfloat16
).to("cuda")

first_frame = load_image("path/to/first_frame.png").resize((832, 480))
arr = np.asarray(first_frame).astype("float32") / 255.0
context = torch.from_numpy(arr).permute(2, 0, 1).unsqueeze(0).unsqueeze(2).to("cuda")

video = pipe(
    prompt="a cat walks across a sunlit lawn",
    context_sequence={"raw": context},
    num_inference_steps=4,
    num_frames=81,
).frames[0]
export_to_video(video, "anyflow_i2v.mp4", fps=16)

Documentation: EN tutorial at docs/source/en/using-diffusers/anyflow.md, ZH tutorial at docs/source/zh/using-diffusers/anyflow.md, and three API pages (pipelines + two transformer model pages). Tests: 22 fast tests (transformer + scheduler, CPU) plus four pipeline test files, with slow integration tests gated on RUN_SLOW=1 @require_torch_accelerator for the released checkpoints.

anyflow-pr-presentation.mp4

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@yiyixuxu @asomoza

…vel imports This is the lazy-loader scaffolding only. Body files (pipeline_anyflow.py, pipeline_anyflow_causal.py, transformer_anyflow.py, scheduling_flow_map_euler_discrete.py) come in subsequent commits.

The flow-map scheduler advances samples from timestep t to caller-provided target r in a single Euler step, supporting any-step sampling on flow-map- distilled checkpoints. It is a general-purpose scheduler — not specific to the AnyFlow checkpoints. Tests: 12 standalone tests covering instantiation, set_timesteps endpoints, shift identity/monotonicity, step shape preservation, zero-interval identity, one-shot sampling, train weight schemes, scale_noise endpoints. Docs: api/schedulers/flow_map_euler_discrete.md

A 3D DiT extending the v0.35.1 Wan2.1 backbone with two config-toggled modules: * FAR causal blocks (init_far_model=True): block-sparse causal attention via flex_attention + compressed-frame patch embedding for frame-level autoregressive generation (Gu et al., 2025, arXiv:2503.19325). * Dual-timestep flow-map embedding (init_flowmap_model=True): adds a delta timestep embedder enabling flow-map sampling z_t -> z_r over arbitrary intervals (AnyFlow). With both flags off, the model reduces to stock Wan2.1. The class is intentionally self-contained rather than annotated with '# Copied from diffusers.models.transformers.transformer_wan' because upstream Wan has been refactored extensively since v0.35.1 (new WanAttention class, different processor architecture). Tests: 9 unit tests covering construction in 3 modes, bidi forward shape and determinism, return_dict variants, save/load round-trip with and without init_far_model, gradient checkpointing toggle. Docs: api/models/anyflow_transformer3d.md

* AnyFlowPipeline (pipeline_anyflow.py, ~590 LOC): bidirectional T2V using flow-map sampling. Loads checkpoints from nvidia/AnyFlow-Wan2.1-T2V-{1.3B,14B}. * AnyFlowCausalPipeline (pipeline_anyflow_causal.py, ~700 LOC): FAR-based causal pipeline supporting T2V/I2V/TV2V via task_type kwarg. Loads checkpoints from nvidia/AnyFlow-FAR-Wan2.1-{1.3B,14B}-Diffusers. Both pipelines reuse stock WanLoraLoaderMixin, AutoencoderKLWan, UMT5EncoderModel, and AutoTokenizer from upstream. The transformer is the AnyFlowTransformer3DModel introduced in the previous commit. The scheduler is FlowMapEulerDiscreteScheduler. Tests: * tests/pipelines/anyflow/test_anyflow.py: PipelineTesterMixin fast tests + slow integration test against nvidia/AnyFlow-Wan2.1-T2V-1.3B-Diffusers. * tests/pipelines/anyflow/test_anyflow_causal.py: same structure for FAR variant. Reference slices for slow integration tests are deferred to Phase 7 (Final quality pass) where the user runs them on a real GPU.

Modeled on the Helios pipeline doc (PR huggingface#13208). Sections: paper link + abstract, supported checkpoints table, memory/speed optimization tabs, T2V/I2V/TV2V examples for both bidirectional and causal variants, autodoc trailers.

…ersion script * Register AnyFlowPipeline in AUTO_TEXT2VIDEO_PIPELINES_MAPPING. * AnyFlowCausalPipeline is intentionally NOT registered for AutoPipeline because its task switch (t2v / i2v / tv2v) is too rich for a single auto-resolve key. * scripts/convert_anyflow_to_diffusers.py: convert .pt training checkpoints (with 'ema' state dict) into a diffusers save_pretrained layout. Supports all 4 released NVIDIA AnyFlow variants. Replaces the omegaconf-based config in the upstream repo with argparse to match other diffusers conversion scripts.

* ruff format pass on all 5 source files (long lines + trailing comma fixes) * check_dummies.py --fix_and_overwrite regenerated: - dummy_pt_objects.py: AnyFlowTransformer3DModel + FlowMapEulerDiscreteScheduler - dummy_torch_and_transformers_objects.py: AnyFlowPipeline + AnyFlowCausalPipeline Local fast tests: 21/21 passed - 12 scheduler tests (FlowMapEulerDiscreteScheduler) - 9 transformer tests (AnyFlowTransformer3DModel construction + bidi forward + save/load) The pipeline fast tests in tests/pipelines/anyflow/ require a local dev install that matches the diffusers main branch's transformers >= compatibility floor. The reference slices for slow integration tests (real GPU + 1.3B/14B checkpoints) are intentionally left as TODO stubs to be captured by the user on a real GPU machine before opening the PR.

…torials Critical bug fixes (verified against precision-validation review): * pipeline_anyflow.py / pipeline_anyflow_causal.py: replace hardcoded transformer_dtype = torch.bfloat16 with self.transformer.dtype, so pipe.to("cpu") and PipelineTesterMixin save/load tests do not crash on a dtype mismatch in the patch_embedding conv3d. * transformer_anyflow.py: drop the duplicate `base = base = ...` assignment in _build_causal_mask (was a copy-paste typo carried over from FAR-Dev). * transformer_anyflow.py: drop unused `q_is_context` / `k_is_context` locals and the `# noqa: F841` markers that were silencing the dead-store warning. * transformer_anyflow.py: remove `CacheMixin` from the inheritance list — the pipeline manages KV cache directly, the mixin's interface is unused. * transformer_anyflow.py: guard the module-level `torch.compile(flex_attention)` with try/except so the file imports cleanly on CPU CI / no-Triton machines. * convert_anyflow_to_diffusers.py: replace ad-hoc print warnings with the stdlib logger (warning_once-style) and a module-level basicConfig. Documentation accuracy: * AnyFlowCausalPipeline class docstring + main pipeline doc + EN/ZH tutorial: drop the fictitious `task_type` / `image` / `video` arguments and document the real API: pass `context_sequence={"raw": tensor}` (or `{"latent": ...}`) to switch between T2V (None) / I2V (1-frame) / TV2V (4n+1-frame) modes. * Pipeline class docstrings + main doc: explicitly describe AnyFlow's two-stage LoRA distillation including DMD reverse-divergence supervision with Flow-Map backward simulation in stage 2 (was previously implicit). * training_rollout: add detailed docstring explaining its role as the 3-segment Flow-Map backward simulation entry point used during DMD training. * Long-form tutorial doc `using-diffusers/anyflow.md` (EN, 239 LOC) and Chinese mirror `docs/source/zh/using-diffusers/anyflow.md` (224 LOC) added and registered in both `_toctree.yml` files. Tests: * Skip `test_attention_slicing_forward_pass` in both pipeline test classes with a clear rationale (custom attention processor does not support slicing). * All 21 standalone tests still pass (12 scheduler + 9 transformer). Quality gates: * `ruff check` clean across all AnyFlow files. * `ruff format --check` reports 6 files already formatted. * `python utils/check_copies.py` reports no diff. Out of scope for this commit (deferred until reviewer feedback): * Splitting AnyFlowTransformer3DModel into bidi + causal subclasses * Unifying _forward_inference / _forward_cache return types * Migrating model tests from plain unittest to BaseModelTesterConfig + mixins * HF model card / config.json metadata updates on the nvidia/* repos (push to Hub manually before opening the PR)

… output Round 2 of review feedback. Three groups of changes; transformer state-dict keys, module hierarchy, and tensor flow are unchanged so the H200 bit-exact validation remains valid. A. Pipeline rename (mechanical, no behavior change): * Class: AnyFlowCausalPipeline -> AnyFlowFARPipeline (Causal in diffusers usually means an attention mask; AnyFlow's variant is FAR autoregressive, so the FAR name is more specific and matches the paper). * File: pipeline_anyflow_causal.py -> pipeline_anyflow_far.py (git mv). * Test file: test_anyflow_causal.py -> test_anyflow_far.py (git mv). * All references updated in src/, tests/, docs/, scripts/, plus stale anyflowcausalpipeline anchor links in tutorial markdown. B. Pipeline test bug fixes (closes 19 fast-test failures reported by precision-validation reviewer): * pipeline_anyflow.py / pipeline_anyflow_far.py: __call__ now sets self._num_timesteps = num_inference_steps before the rollout, so the PipelineTesterMixin callback tests can read pipe.num_timesteps. * tests/pipelines/anyflow/test_anyflow_far.py: drop the fictitious task_type="t2v" kwarg that crashed every causal fast test (the FAR pipeline selects mode via context_sequence, not a task_type arg). C. Transformer architecture cleanups (review-driven, no tensor changes): * Replace forward(*args, **kwargs) dispatcher with an explicit signature listing every supported kwarg (hidden_states, timestep, r_timestep, encoder_hidden_states, encoder_hidden_states_image, chunk_partition, clean_hidden_states, clean_timestep, kv_cache, kv_cache_flag, is_causal, attention_kwargs, return_dict). Helps IDE / type-checker / torch.compile tracing. * Drop SimpleNamespace returns. Add AnyFlowFARTransformerOutput (BaseOutput dataclass with sample + kv_cache fields) for the two causal paths that need to also propagate kv_cache (_forward_inference and the newly return_dict-aware _forward_cache). _forward_train and _forward_bidirection now consistently return Transformer2DModelOutput. Pipeline call sites already use return_dict=False with positional unpacking, so the fix is transparent there. Out of scope (deferred until canonical-org HF metadata sync): * Splitting AnyFlowTransformer3DModel into a bidi class plus an AnyFlowFARTransformer3DModel subclass — touches register_to_config keys and would require updating model_index.json on every released checkpoint. * Promoting chunk_partition from register_to_config to a forward-time argument (same reason). * Renaming training_rollout to _denoise — would break callers in the FAR-Dev on-policy trainer that produced the released checkpoints. Local fast tests: 21/21 still pass (12 scheduler + 9 transformer). ruff check, ruff format, and check_copies.py are all clean.

…nk_partition to FAR fast-test fixture Two root causes for the 19 remaining PipelineTesterMixin failures, identified by the H200 reviewer: 1. callback_on_step_end was accepted by __call__ but never invoked. Both pipelines pass it through to training_rollout (and FAR additionally through inference()), and inference_range now fires it after scheduler.step in the standard inference branch: if callback_on_step_end is not None: callback_kwargs = {k: locals()[k] for k in callback_on_step_end_tensor_inputs} callback_outputs = callback_on_step_end(self, i, t, callback_kwargs) latents = callback_outputs.pop("latents", latents) prompt_embeds = ... negative_prompt_embeds = ... `nonlocal prompt_embeds, negative_prompt_embeds` lets the callback rewrite the closure-captured embeddings, matching upstream WanPipeline semantics. The 3-segment grad_timestep training rollout does not invoke the callback; it is intentionally training-only. 2. tests/pipelines/anyflow/test_anyflow_far.py::get_dummy_components built the dummy transformer without a `chunk_partition`, leaving it None on the model config and crashing the pipeline at `sum(self.transformer.config.chunk_partition)`. Set `chunk_partition=[1, 1, 1]` in the fixture (3 chunks of 1 latent frame each, matching the test's num_frames=9 -> 3 latent frames). Local fast tests: 21/21 still pass. ruff check, ruff format, and check_copies.py are all clean.

…ig + rename helpers Major architectural refactor that aligns the integration with diffusers conventions ahead of the canonical-org Hub upload. State-dict keys, module hierarchy, and tensor flow are unchanged so the H200 bit-exact validation remains valid; only the on-disk transformer/config.json fields move. Changes: 1. **Sibling transformer classes** replace the flag-driven single class: * AnyFlowTransformer3DModel — bidirectional only. Drops compressed_patch_size / full_chunk_limit / init_far_model / init_flowmap_model / chunk_partition kwargs (always-on for AnyFlow distilled checkpoints). * AnyFlowFARTransformer3DModel — adds far_patch_embedding + the 3 FAR forward paths (train / cache-prefill / autoregressive inference). * AnyFlowTimeTextImageEmbedding (the legacy single-time embedder used only by the old setup_flowmap_model bootstrap) is removed; both classes now build AnyFlowDualTimestepTextImageEmbedding directly in __init__. * setup_flowmap_model / setup_far_model methods are removed; weight warm-start for far_patch_embedding (trilinear interpolation from patch_embedding) moves into AnyFlowFARTransformer3DModel.__init__. 2. **chunk_partition** is no longer a model config field. The FAR pipeline owns the schedule: * AnyFlowFARPipeline.default_chunk_partition = [1, 3, 3, 3, 3, 3, 3, 2] matches the released 81-frame NVIDIA checkpoints. * AnyFlowFARPipeline.__call__ / _denoise_rollout accept a chunk_partition argument that overrides the default for non-default num_frames. 3. **training_rollout -> _denoise_rollout** rename across both pipelines and all English / Chinese docs that referenced it. Signals the method is internal to the pipeline driver, not a public training API. 4. **Conversion script + tests + docs + registries**: * scripts/convert_anyflow_to_diffusers.py: VARIANTS dict picks the right transformer class per variant; init_far_model / init_flowmap_model / chunk_partition kwargs are removed from the from_pretrained call. * Transformer test file split into AnyFlowTransformer3DModelTest and AnyFlowFARTransformer3DModelTest classes. * Pipeline test fixtures use the right class and pass chunk_partition via get_dummy_inputs (3-frame schedule [1, 1, 1] for the 9-frame test). * New docs page docs/source/en/api/models/anyflow_far_transformer3d.md; anyflow_transformer3d.md rewritten for the bidi-only class. * AnyFlowFARTransformer3DModel registered in src/diffusers/__init__.py, src/diffusers/models/__init__.py, models/transformers/__init__.py and the dummy_pt_objects.py stubs. * docs/source/en/_toctree.yml: new entry for the FAR transformer page. 5. **Cleanups**: * Pipeline __call__ no longer passes is_causal=False to the bidi forward (the bidi class doesn't accept it). * Pipeline class docstrings drop stale references to init_*_model flags. Local tests: 22/22 pass (12 scheduler + 10 transformer covering both classes). ruff check / format / check_copies clean. Hub artifacts (model_index.json, transformer/config.json, scheduler config) need to be regenerated for the released checkpoints; the HF update guide will be delivered separately.

…models.md Hard violations (per official diffusers guidelines): * drop einops dependency — replace 25+ rearrange() calls with native permute/reshape/unflatten in transformer + both pipelines * device-gate torch.float64 — apply_rotary_emb and AnyFlowRotaryPosEmbed now fall back to float32 / complex64 on MPS / NPU; freqs are lazily rebuilt per-device via _build_freqs (matches transformer_wan / transformer_flux pattern) * migrate attention to dispatch_attention_fn — replace direct F.scaled_dot_product_attention calls with dispatch_attention_fn (works with sage / flash / native backends); introduce AnyFlowAttention( AttentionModuleMixin) with _default_processor_cls / _available_processors; rename processors to AnyFlowAttnProcessor / AnyFlowCrossAttnProcessor and declare _attention_backend / _parallel_config class attrs * drop dead config fields — qk_norm and added_kv_proj_dim are pruned from both transformer __init__ signatures and AnyFlowTransformerBlock; AnyFlowAttention is hardcoded to rms-norm-across-heads (the only scheme the released checkpoints use) and has no add_k_proj path (T2V only) * add _repeated_blocks = ["AnyFlowTransformerBlock"] to both transformer classes for compile_repeated_blocks() support (matches Wan) * annotate prepare_latents with `# Copied from diffusers.pipelines.wan. pipeline_wan.WanPipeline.prepare_latents`; the pipeline-side rearrange to (B, T, C, H, W) layout is moved to the call site State-dict keys are preserved (legacy Attention had identical to_q / to_k / to_v / to_out / norm_q / norm_k naming), so existing AnyFlow checkpoints load bit-exactly into the new AnyFlowAttention class. The HF Hub config-update guide is updated correspondingly: transformer/ config.json now drops qk_norm and added_kv_proj_dim alongside the previous init_far_model / init_flowmap_model / chunk_partition removals. 22 fast CPU tests still pass; ruff format / ruff check / check_copies all clean.

…/head-dim fallbacks + KV-cache dtype + num_timesteps Phase 3 migrated bidi + cross-attention to dispatch_attention_fn but the FAR causal path still calls flex_attention directly, which has hard requirements (CPU compile, head_dim >= 16) that fail on PipelineTesterMixin's tiny dummy components. Real ckpts (head_dim=128, CUDA) never hit these branches; bit-exact numerical equivalence with FAR-Dev preserved on all 4 released ckpts (forward 0.00e+00, backward kernel-nondet only, ratio 1.000). Code fixes: 1. AnyFlowRotaryPosEmbed._forward_compressed_frame / _forward_full_frame now short-circuit to an empty tensor when num_frames / height / width is 0. PipelineTesterMixin's dummy VAE has scale_factor_spatial=8, so a 16x16 raw spatial input becomes a 2x2 latent which then floors to 0 against compressed_patch_size=(1, 4, 4); the original `freqs[:0].view(0, k, 1, -1)` reshape was ambiguous in that regime. 2. flex_attention dispatch: split the module-load `torch.compile(flex_attention, dynamic=True)` into `_flex_attention_eager` (always available) plus `_flex_attention_compiled`, with a tiny wrapper that picks compiled for CUDA tensors and eager for CPU. Avoids torch._inductor C++ codegen failures that broke fast tests after `pipe.to("cpu")`. CUDA performance unchanged (L10 benchmark: 0.0% delta on bidi 1.3B fwd, 0.0% delta on FAR causal 1.3B fwd). 3. AnyFlowAttnProcessor (FAR causal branch): when head_dim < 16 (flex_attention's hard minimum) zero-pad q/k/v's last dim to 16 and pass `scale=1/sqrt(original_head_dim)` to flex_attention. Padded value rows contribute 0, so trimming the output back is mathematically equivalent. Released ckpts use head_dim=128 so the branch is never taken in production. 4. pipeline_anyflow_far.encode_kv_cache: replace the hardcoded `latents.to(torch.bfloat16)` with `self.transformer.dtype`. The hardcoded bf16 crashed conv3d on dummy fp32 components ("Input type (BFloat16) and bias type (float) should be the same"); real bf16 ckpts are unaffected. 5. pipeline_anyflow_far._denoise_rollout sets `self._num_timesteps = (len(chunk_partition) - num_context_chunks) * num_inference_steps` before the chunk loop, so PipelineTesterMixin.test_callback_cfg's `pipe.num_timesteps`-based assertion matches the actual number of callback fires (chunks * NFE) instead of the previous hardcoded num_inference_steps. Tests: * test_callback_inputs cannot pass without changing FAR's chunk-wise output semantics — it zeroes latents on the final step and asserts the *entire* output buffer is zero, but only the active chunk's slice is overwritten in a chunk-wise rollout. Marked `@unittest.skip` with a detailed rationale; callback functionality itself is still covered by test_callback_cfg. * Full pytest run on tests/pipelines/anyflow/ + tests/models/transformers/test_models_transformer_anyflow.py + tests/schedulers/test_scheduler_flow_map_euler_discrete.py: 81 passed, 0 failed, 11 skipped. Quality gates: * `ruff check` and `ruff format --check` clean across all AnyFlow files. * `python utils/check_copies.py` clean. * `python utils/check_dummies.py` clean.

User-facing alignment with the official HF Hub model card and the day-of-announcement materials at https://huggingface.co/collections/nvidia/anyflow. * Fill in the arXiv identifier 2605.13724 (5 paper links + 2 BibTeX entries). * Rename TV2V → V2V across docs + pipeline_anyflow{,_far}.py so the diffusers copy uses the same Video-to-Video terminology as the official model card. * Add the [nvidia/anyflow](https://huggingface.co/collections/nvidia/anyflow) HF collection link to the three tutorial intros. * Drop the temporary "guyuchao/* staging" tip from the EN tutorial / API page / ZH tutorial — the nvidia/AnyFlow-*-Diffusers repos are now live. * Wire up NVlabs/AnyFlow (training code) and nvlabs.github.io/AnyFlow (project page) in place of the prior <github-org> / <project-page-url> placeholders. * Cite the authors (Yuchao Gu, Guian Fang et al.) and NUS ShowLab × NVIDIA affiliation in the main tutorial, API pipeline page, and both transformer model pages; BibTeX uses the standard `and others` to elide the full list until the next pass. Working tree, CI gates, and tests after the change: ruff format --check ✓ ruff check ✓ python utils/check_copies.py ✓ python utils/check_dummies.py ✓ pytest tests/models + tests/schedulers (22 fast) ✓ No production code logic changes — only docstring wording inside pipeline files (TV2V → V2V).

Replace the placeholder ``@article{gu2026anyflow, author = {Gu, Yuchao and Fang, Guian and others}, ...}`` block in both the English and Chinese tutorials with the canonical ``@misc{gu2026anyflowanystepvideodiffusion, ...}`` form from arxiv.org/abs/2605.13724, which lists all seven authors: Yuchao Gu, Guian Fang, Yuxin Jiang, Weijia Mao, Song Han, Han Cai, Mike Zheng Shou. Docs-only.

Scheduler - FlowMapEulerDiscreteScheduler.step now returns a FlowMapEulerDiscreteSchedulerOutput dataclass (or tuple with return_dict=False) and uses the conventional positional order (model_output, timestep, sample, r_timestep). - Drop training-only helpers: adaptive_weighting, set_train_weight, get_train_weight, linear_timesteps_weights, and the weight_type config field. - Add scale_model_input no-op for API parity; raise ValueError on missing r_timestep. Transformer - Remove gate_track debug write inside AnyFlowDualTimestepTextImageEmbedding.forward_timestep. - Compile flex_attention lazily on first CUDA call instead of at import time. - Replace assert with ValueError in build_block_mask. - Resolve <arxiv-id> placeholders to 2605.13724. Pipelines (AnyFlowPipeline + AnyFlowFARPipeline) - Add EXAMPLE_DOC_STRING + @replace_example_docstring and full __call__ docstrings covering every argument. - Move use_mean_velocity from __init__ to __call__ so save/load round-trips. - Drop _denoise_rollout's grad_timestep branch (DMD on-policy training rollout), the inner inference_range closure, and the redundant negative-prompt concat. - Replace asserts with ValueError; wire show_progress to tqdm; rename inference -> _inference; remove dead current_timestep property. - Update scheduler.step call sites to the new signature. - Trim class docstrings to inference-only language. Pipeline output - Add Apache 2.0 license header; switch to relative import. Auto pipeline / conversion script - Register AnyFlowFARPipeline in AUTO_IMAGE2VIDEO_PIPELINES_MAPPING and AUTO_VIDEO2VIDEO_PIPELINES_MAPPING. - Document the weights_only=False requirement in the conversion script. Tests - Scheduler tests use the new step signature and verify the Output dataclass contract. - Drop the four obsolete training-weight tests; drop weight_type kwarg from pipeline test fixtures; remove internal milestone names from TODO comments. Docs - Resolve <arxiv-id> in the scheduler docs page. - Trim DMD / on-policy distillation language in EN/ZH tutorials and the pipelines page; the paper abstract quote is preserved verbatim.

dg845 · 2026-05-19T06:53:37Z

+    def __call__(
+        self,
+        prompt: Union[str, List[str]] = None,
+        context_sequence: Optional[torch.Tensor] = None,


Rather than having a context_sequence argument, I think we should use a more standard image argument (if only I2V is supported) or video argument (if both I2V and V2V are supported). See for example WanImageToVideoPipeline:

diffusers/src/diffusers/pipelines/wan/pipeline_wan_i2v.py

Line 511 in 907c0c2

image: PipelineImageInput,

If we want to support VAE latents as well, we can add an additional image_latents or video_latents argument.

Renamed context_sequence → video (pixel-space, [0,1], (B, C, T, H, W)) + optional video_latents (pre-encoded, in the model layout). Went with video rather than image because the bidi pipeline accepts arbitrary-length conditioning prefixes — both I2V (single-frame) and V2V (multi-frame) work — so naming it image would mislead V2V users. Mutually-exclusive validation raises ValueError if both are passed; the example docstring is updated.

dg845 · 2026-05-19T06:56:03Z

+        return latents
+
+    @torch.no_grad()
+    def encode_latents(self, videos, sample=True):


Analogous comment to #13745 (comment); I think it would be better to have one method that combines both vae_encode and encode_latents.

Same change as the bidi pipeline — single encode_video method, normalize via self.video_processor.

dg845 · 2026-05-19T07:02:26Z

+
+        return latents
+
+    def _denoise_rollout(


Similar to #13745 (comment), I think it would fit the diffusers code style better to inline both _denoise_rollout and _inference into __call__ as a nested loop. Existing autoregressive pipelines like WanAnimatePipeline and LLaDA2Pipeline do this; for example, here is what WanAnimatePipeline does:

diffusers/src/diffusers/pipelines/wan/pipeline_wan_animate.py

Line 1035 in 907c0c2

for _ in range(num_segments):

Inlined _denoise_rollout and _inference into AnyFlowFARPipeline.__call__ as a nested loop (outer over chunks, inner over denoising steps), mirroring WanAnimatePipeline.__call__:1035. The one helper I kept private is encode_kv_cache: it's a single transformer call run with a different kv_cache_flag mode (cache-write) — inlining it would interleave two distinct forward semantics in the loop body and lose readability. Happy to inline it too if you'd rather see one fat __call__.

dg845 · 2026-05-19T07:05:57Z

+    def __call__(
+        self,
+        prompt: Union[str, List[str]] = None,
+        context_sequence: Optional[Dict[str, torch.Tensor]] = None,


Similar to #13745 (comment), I think we should use a video argument here (since both I2V and V2V are supported) rather than a context_sequence dict argument here. See for example WanVideoToVideoPipeline:

diffusers/src/diffusers/pipelines/wan/pipeline_wan_video2video.py

Line 483 in 907c0c2

video: list[Image.Image] = None,

If we want to support VAE latents, we can add a video_latents argument.

Same as the bidi pipeline — replaced the context_sequence dict ({"raw"/"latent"} keys) with two kwargs: video (pre-VAE, (B, C, T, H, W) in [0, 1]) and video_latents (pre-encoded). The dict was redundant with the kwarg name. Mutually-exclusive validation as above.

dg845 · 2026-05-19T07:11:41Z

+        device: Union[str, torch.device] = None,
+    ) -> None:
+        """Build the inference timestep schedule on ``device`` and store it on ``self.timesteps``."""
+        timesteps = torch.linspace(1.0, 0.0, num_inference_steps + 1, dtype=torch.float64, device=device)


I think timesteps here should have exactly num_inference_steps steps rather than num_inference_steps + 1 steps so that its behavior is more in line with other schedulers like FlowMatchEulerDiscreteScheduler.

For example, we could have a final_timestep attribute which defaults to 0.0, or we could use a sigmas array under the hood which has num_inference_steps + 1 elements like FlowMatchEulerDiscreteScheduler:

diffusers/src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Line 380 in 907c0c2

sigmas = torch.cat([sigmas, torch.zeros(1, device=sigmas.device)])

Done — set_timesteps(N) now produces N timesteps backed by an internal sigmas[N+1] linspace, matching FlowMatchEulerDiscreteScheduler.set_timesteps. The final sigma (== 0) is the implicit r-endpoint of the last step; pipeline rollouts iterate for i, t in enumerate(timesteps) without [:-1]. Sigmas are built in float64 on CPU then moved to the target device, with a float32 downcast for MPS / NPU (float64 isn't supported there).

dg845 · 2026-05-19T07:17:46Z

+        if r_timestep is None:
+            raise ValueError(
+                "`FlowMapEulerDiscreteScheduler.step` requires an explicit `r_timestep`; this scheduler does "
+                "not infer the target timestep from internal state."
+            )


Are there use cases where r_timestep is not the next timestep in the timestep schedule? I see that both the bidirectional and FAR causal pipeline set r = timesteps[i + 1].

If we usually want r_timestep to be the next timestep, I think we should default to setting r_timestep to it here in step when it is None rather than raising an error. This would also make the step API more consistent with other schedulers like FlowMatchEulerDiscreteScheduler.

Done — step(r_timestep=None) now resolves the target timestep from self.sigmas[i + 1] by matching timestep against the schedule (fp-tolerant argmin). Explicit r_timestep is still honored, so any-step sampling is preserved. The raise stays only for the case where the caller passes a timestep value that isn't on the schedule and provides no r_timestep — no sensible default exists there.

dg845 · 2026-05-19T07:20:02Z

+    }
+
+
+class AnyFlowTransformer3DModelTest(unittest.TestCase):


Can we combine the model tests here with the standard transformer test suite generated by utils/generate_model_tests.py?

python utils/generate_model_tests.py src/diffusers/models/transformers/transformer_anyflow.py

Done — regenerated via python utils/generate_model_tests.py src/diffusers/models/transformers/transformer_anyflow.py (and the same for the FAR file). Tests now use BaseModelTesterConfig + ModelTesterMixin / MemoryTesterMixin / TrainingTesterMixin / AttentionTesterMixin / TorchCompileTesterMixin instead of the hand-rolled cases.

dg845 · 2026-05-19T07:21:26Z

+        self.assertFalse(m.gradient_checkpointing)
+
+
+class AnyFlowFARTransformer3DModelTest(unittest.TestCase):


If we refactor the causal transformer AnyFlowFARTransformer3DModel into its own modeling file as in #13745 (comment), I think we should put the causal transformer tests into its own test file as well.

Done — FAR causal model tests moved to tests/models/transformers/test_models_transformer_anyflow_far.py. The bidi file is bidi-only; the FAR file additionally carries an AnyFlowCausalAttnProcessor smoke test that exercises the backend gate.

dg845

Thanks for the PR! I left an initial design review :).

Enderfga · 2026-05-19T07:33:58Z

Thanks for the thorough review @dg845 — got it, working through the full list now (transformer split, pipeline cleanup, scheduler, tests). I'm going to batch everything into a single follow-up rather than incremental commits, with the bit-exact replay against NVlabs/AnyFlow re-verified before push. Will ping once it's ready for re-review.

@dg845

Per @dg845's review on huggingface#13745: extract FAR causal modules into a dedicated sibling file so each transformer variant reads in isolation. Shared submodules are duplicated via `# Copied from` so `make fix-copies` keeps both in sync. - `transformer_anyflow.py`: bidi-only. `AnyFlowAttnProcessor` no longer carries the flex/KV-cache branch (was: dispatch in one branch, bare flex_attention in the other); `AnyFlowRotaryPosEmbed` drops the compressed-frame helpers and the `is_causal` arg; `AnyFlowDualTimestepTextImageEmbedding` drops its causal branch. `AnyFlowTransformerBlock` keeps a single class with a new `is_causal: bool = False` ctor flag that selects the self-attn processor — the forward path is identical in both modes, only the processor differs. - `transformer_anyflow_far.py`: new. Contains `AnyFlowFARTransformerOutput`, `AnyFlowCausalAttnProcessor` (routed through `dispatch_attention_fn(backend= "flex")` with a clear ValueError when a non-flex backend is configured; the BlockMask is consumed only by the flex backend in `_native_flex_attention`), `AnyFlowDualTimestepTextImageEmbeddingCausal`, `AnyFlowCausalRotaryPosEmbed`, `AnyFlowFARTransformer3DModel`, and `# Copied from` clones of the shared shared `AnyFlowAttention`/`AnyFlowCrossAttnProcessor`/`AnyFlowImageEmbedding`/ `AnyFlowTransformerBlock`/`AnyFlowAttnProcessor` modules. Verified bit-exact against the pre-refactor branch on H200 (float32): - bidi: L2 = 0.000e+00, max|Δ| = 0.000e+00 - FAR : L2 = 4.772e-06, max|Δ| = 3.576e-07 The FAR delta is fp32 accumulation noise from the dispatch path permuting (B,L,H,D) ↔ (B,H,L,D) around the same `flex_attention` kernel. Addresses review comments at transformer_anyflow.py:215, :261, :450, :622, :671, :958.

@dg845

…lout, kwarg rename Per @dg845's review on huggingface#13745, applied to both bidi `AnyFlowPipeline` and causal `AnyFlowFARPipeline`: - Use `self.video_processor.preprocess_video(...)` instead of the manual `* 2 - 1` normalize. - Merge `vae_encode` + `encode_latents` + `_normalize_latents` into a single `encode_video` method, mirroring `WanImageToVideoPipeline.encode_image`'s flat structure. - Inline `_denoise_rollout` into `AnyFlowPipeline.__call__`. For the FAR pipeline, inline both `_denoise_rollout` and `_inference` as a nested loop (outer over chunks, inner over denoising steps), mirroring `WanAnimatePipeline.__call__`. `encode_kv_cache` is intentionally kept as a method — it is one transformer call with a different `kv_cache_flag` mode (cache-write), and inlining it would interleave two distinct forward semantics in the same loop body and lose readability. - Rename `context_sequence` → `video` (pixel-space) + `video_latents` (pre-encoded), matching `WanVideoToVideoPipeline`. For the FAR pipeline, the old `{"raw"/"latent"}` dict form is replaced by the two kwargs. Mutually-exclusive validation raises `ValueError`. Addresses review comments at pipeline_anyflow.py:358, :372, :393, :473 and pipeline_anyflow_far.py:395, :489, :675.

@dg845

Per @dg845's review on huggingface#13745: - `set_timesteps(N)` now produces `N` timesteps backed by an internal `sigmas[N+1]` linspace, matching `FlowMatchEulerDiscreteScheduler.set_ timesteps`. The final sigma (== 0) is the implicit r-endpoint of the last step; the pipeline rollouts iterate `for i, t in enumerate(timesteps)` without the old `[:-1]` slicing. - `step(r_timestep=None)` now defaults to the next timestep on the schedule (resolved via fp-tolerant `argmin` over `sigmas[:-1]`), instead of raising. Any-step sampling is preserved when `r_timestep` is explicit. The raise stays only for the case where the caller passes a `timestep` value that isn't on the schedule and provides no `r_timestep` — there's no sensible default in that case. - Build sigmas in float64 on CPU then move to the target device, with a float32 downcast for MPS / NPU (float64 isn't supported on those backends). Pipeline rollout loops updated to compute `r = sigmas[i + 1] * num_train_ timesteps` for the model's `r_timestep` input and pass `r_timestep=None` to `scheduler.step` (which resolves it from the schedule internally). Addresses review comments at scheduling_flow_map_euler_discrete.py:107 and :148.

@dg845

…AR files Per @dg845's review on huggingface#13745: replaced the hand-rolled transformer tests with the standard mixin-based suite produced by `utils/generate_model_tests .py`, and split the FAR causal model tests into their own file to mirror the transformer file split. - `tests/models/transformers/test_models_transformer_anyflow.py`: regenerated bidi suite. Pulls in `ModelTesterMixin`, `MemoryTesterMixin`, `TrainingTesterMixin`, `AttentionTesterMixin`, `TorchCompileTesterMixin` via `BaseModelTesterConfig`, with `get_init_dict()` / `get_dummy_inputs()` filled in for the small bidi config used in CI. - `tests/models/transformers/test_models_transformer_anyflow_far.py`: new. Same mixin set (TorchCompile is intentionally skipped — FAR's `_build_causal_mask` uses `flex_attention.create_block_mask(_compile=False)` which conflicts with the standard compile tester's assumptions; the bidi file covers compile, FAR is bit-exact-validated end-to-end on H200 via the pipeline replay). Also carries an `AnyFlowCausalAttnProcessor` smoke test that exercises the backend gate (non-flex backends must raise) and asserts the `AnyFlowFARTransformerOutput` dataclass exposes the expected fields. Addresses review comments at test_models_transformer_anyflow.py:71 and :128.

Enderfga · 2026-05-19T08:49:21Z

@dg845 the full review pass is now addressed across 4 commits on this branch (3fa25d1, e9d50b2, 7ea034c, cf574ad). Per-thread replies are inline; quick rollup:

Transformer split — transformer_anyflow_far.py is new and self-contained. AnyFlowCausalAttnProcessor now routes through dispatch_attention_fn(backend="flex") (same kernel as before; non-flex backends raise a clear ValueError). AnyFlowDualTimestepTextImageEmbedding and AnyFlowRotaryPosEmbed are split into normal / *Causal variants. AnyFlowTransformerBlock keeps one class with an is_causal: bool = False flag per your L671 suggestion. Shared modules (AnyFlowAttention, AnyFlowCrossAttnProcessor, AnyFlowImageEmbedding, AnyFlowTransformerBlock, plus AnyFlowAttnProcessor / apply_rotary_emb) are cloned into the FAR file via # Copied from — make fix-copies is a no-op.

Pipeline cleanup — both pipelines: merged vae_encode + encode_latents + _normalize_latents into one encode_video; self.video_processor.preprocess_video(...) replaces the manual *2-1; _denoise_rollout is inlined into __call__ (and for the FAR pipeline, _inference is inlined alongside as a nested chunk/timestep loop, mirroring WanAnimatePipeline.__call__:1035). context_sequence is renamed to video + optional video_latents. The one helper I kept private is encode_kv_cache — it's a single transformer call run in a different kv_cache_flag mode (cache-write) and inlining it would interleave two distinct forward semantics; happy to inline that too if you'd rather.

Scheduler — set_timesteps(N) now returns N timesteps backed by an internal sigmas[N+1], matching FlowMatchEulerDiscreteScheduler. step(r_timestep=None) resolves r from the schedule by default (any-step explicit override preserved). Drive-by fix: float32 fallback for MPS / NPU.

Tests — regenerated via utils/generate_model_tests.py (both files); FAR tests split into their own file. The bidi mixin set picks up ModelTesterMixin / MemoryTesterMixin / TrainingTesterMixin / AttentionTesterMixin / TorchCompileTesterMixin; the FAR file skips TorchCompileTesterMixin (FAR's flex_attention.create_block_mask(_compile=False) doesn't play with the standard compile tester) and adds a smoke test for the AnyFlowCausalAttnProcessor backend gate.

Bit-exact — re-verified on H200 against the pre-refactor branch, float32:

bidi: L2 = 0.000e+00, max|Δ| = 0.000e+00
FAR : L2 = 4.772e-06, max|Δ| = 3.576e-07

The FAR delta is fp32 accumulation noise from dispatch_attention_fn's (B,L,H,D)↔(B,H,L,D) permute around the same flex_attention kernel — well below 1e-5, no logic change.

make fix-copies clean; make quality clean. Ready for re-review whenever you have a slot. 🙏

@dg845

…s section The diffusers AnyFlow pipelines renamed the conditioning kwarg from ``context_sequence={"raw"/"latent"}`` to ``video`` / ``video_latents`` in huggingface/diffusers#13745 (review feedback from @dg845 — match ``WanVideoToVideoPipeline``'s API surface). Update the README to reflect the new kwarg and add a short I2V example showing how to pass the single-frame conditioning tensor. Only docs change; the in-repo ``WanAnyFlowPipeline`` / ``FARWanAnyFlowPipeline`` keep their original ``context_sequence`` kwarg.

Following the pipeline kwarg refactor in e9d50b2, sweep the user-facing docs to reflect the new API: - `docs/source/en/api/pipelines/anyflow.md`: T2V / I2V / V2V code examples now use `video=` instead of `context_sequence={"raw": ...}`. The "Generation with AnyFlow (FAR Causal)" intro describes the new mutually-exclusive `video` / `video_latents` selector. - `docs/source/en/using-diffusers/anyflow.md`: the scenario selector table, the "Image-to-video and video-to-video" walkthrough, and the closing note about pre-encoded latents are all updated. `vae_encode` references are replaced with `encode_video`.

dg845 · 2026-05-20T04:59:36Z

Hi, when I try out the script above:

import torch
from diffusers import AnyFlowPipeline
from diffusers.utils import export_to_video

pipe = AnyFlowPipeline.from_pretrained(
    "nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers", torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A red panda eating bamboo in a forest, cinematic lighting"
video = pipe(
    prompt,
    num_inference_steps=4,
    num_frames=33,
    generator=torch.Generator("cuda").manual_seed(42),
).frames[0]
export_to_video(video, "anyflow_t2v.mp4", fps=16)

I get an error when loading the checkpoint:

ValueError: scheduler/far.schedulers.scheduling_flowmap_euler_discrete.py as defined in `model_index.json` does not exist in nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers and is not a module in 'diffusers/pipelines'.

I think this is because the nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers checkpoint references custom folders in model_index.json that don't exist in the diffusers PR (probably to support the official code?). Would it be possible to create diffusers checkpoints for the models?

Enderfga · 2026-05-20T05:16:09Z

Thanks for the repro @dg845 — you're right, this is a metadata mismatch on the Hub side, not a code-path bug. Let me unpack the three pieces:

1. Why it fails

The published nvidia/AnyFlow-*-Diffusers checkpoints' model_index.json (and the per-component config.json files) still reference the class paths from the original NVlabs/AnyFlow repo — e.g.

"scheduler": ["far.schedulers.scheduling_flowmap_euler_discrete", "FlowMapDiscreteScheduler"]

Those module paths only resolve when the NVlabs repo is on sys.path; from a stock diffusers install they fail exactly the way you saw. The weight tensors themselves load fine — state-dict keys are bit-exact between the NVlabs classes and the diffusers AnyFlow classes (re-verified on H200 after the refactor: forward L2 = 0 vs the original).

2. Fix is already prepared upstream

NVlabs/AnyFlow#2 (currently in draft) adds from_pretrained overrides + class aliases on the NVlabs side so the in-repo code (WanAnyFlowPipeline, FAR_Wan_Transformer3DModel, FlowMapDiscreteScheduler) keeps loading the checkpoints after the Hub metadata flips. That way users who only have the NVlabs code aren't broken when the published configs switch.

3. Hub metadata update lands when this PR merges

Once this diffusers PR is in, I'll push a metadata-only update to each of the four nvidia/AnyFlow-*-Diffusers repos — model_index.json + scheduler/config.json + transformer/config.json rewritten to reference AnyFlowPipeline / AnyFlowFARPipeline / AnyFlowTransformer3DModel / AnyFlowFARTransformer3DModel / FlowMapEulerDiscreteScheduler. No weight re-upload, no breaking changes to existing downloads (Hub revisions are immutable; the new metadata is just a new commit).

After both land, your repro script will work from a stock diffusers install with no extra dependencies.

Enderfga · 2026-05-20T05:43:37Z

If you'd rather try it now without waiting for the Hub metadata update, here's a small workaround script that rewrites the three config files (model_index.json, scheduler/scheduler_config.json, transformer/config.json) in a locally-downloaded checkpoint so they reference the diffusers AnyFlow class names instead of the far.* paths:

📜 Gist: https://gist.github.com/Enderfga/80fe3e7debc4eeda4c15e873ed5f53aa

Usage:

# 1. Snapshot-download the checkpoint
huggingface-cli download nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers \
  --local-dir ./AnyFlow-Wan2.1-T2V-14B-Diffusers

# 2. Patch the configs (writes .bak next to each edited file)
python fix_anyflow_diffusers_config.py ./AnyFlow-Wan2.1-T2V-14B-Diffusers

# 3. Load from the local directory
python -c "
import torch
from diffusers import AnyFlowPipeline
pipe = AnyFlowPipeline.from_pretrained('./AnyFlow-Wan2.1-T2V-14B-Diffusers', torch_dtype=torch.bfloat16).to('cuda')
"

The script auto-detects bidi vs FAR from the transformer config's init_far_model flag and picks the right diffusers class (AnyFlowPipeline + AnyFlowTransformer3DModel vs AnyFlowFARPipeline + AnyFlowFARTransformer3DModel). Smoke-tested against the bidi 1.3B checkpoint on H200; weights load cleanly with no missing/unexpected keys.

This is just a workaround — the published Hub metadata will be updated to match once this PR merges, so the script becomes unnecessary at that point.

Enderfga added 15 commits May 6, 2026 14:41

[Pipelines] AnyFlow: scaffold pipelines/anyflow + register all top-le…

507fd9b

…vel imports This is the lazy-loader scaffolding only. Body files (pipeline_anyflow.py, pipeline_anyflow_causal.py, transformer_anyflow.py, scheduling_flow_map_euler_discrete.py) come in subsequent commits.

github-actions Bot added size/L PR with diff > 200 LOC documentation Improvements or additions to documentation models tests utils pipelines schedulers and removed size/L PR with diff > 200 LOC labels May 14, 2026

Merge branch 'main' into add-anyflow-pipeline

8da3679

github-actions Bot added the size/L PR with diff > 200 LOC label May 14, 2026

Enderfga mentioned this pull request May 14, 2026

Load nvidia/AnyFlow-* checkpoints from the diffusers AnyFlow metadata layout NVlabs/AnyFlow#2

Draft

Enderfga and others added 2 commits May 14, 2026 20:57

Merge branch 'main' into add-anyflow-pipeline

76e91f8

dg845 requested review from dg845 and yiyixuxu May 16, 2026 00:16

dg845 reviewed May 19, 2026

View reviewed changes

Comment thread src/diffusers/pipelines/anyflow/pipeline_anyflow.py Outdated

dg845 reviewed May 19, 2026

View reviewed changes

Comment thread src/diffusers/pipelines/anyflow/pipeline_anyflow.py Outdated

dg845 reviewed May 19, 2026

View reviewed changes

Enderfga mentioned this pull request May 19, 2026

[feat] Add AnyFlow any-step video distillation (pretrain + on-policy) hao-ai-lab/FastVideo#1371

Open

4 tasks

Enderfga added 4 commits May 19, 2026 16:46

Enderfga force-pushed the add-anyflow-pipeline branch from f82bce2 to 46a1fab Compare May 19, 2026 09:40

github-actions Bot added quantization examples labels May 19, 2026

Enderfga and others added 2 commits May 19, 2026 17:42

Merge branch 'main' into add-anyflow-pipeline

6d8f93a

Enderfga force-pushed the add-anyflow-pipeline branch from 46a1fab to 6d8f93a Compare May 19, 2026 09:44

github-actions Bot removed quantization examples labels May 19, 2026

		self.assertFalse(m.gradient_checkpointing)


		class AnyFlowFARTransformer3DModelTest(unittest.TestCase):

Conversation

Enderfga commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dg845 May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

Enderfga commented May 19, 2026

Uh oh!

Enderfga commented May 19, 2026

Uh oh!

dg845 commented May 20, 2026

Uh oh!

Enderfga commented May 20, 2026

Uh oh!

Enderfga commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enderfga commented May 14, 2026 •

edited

Loading

dg845 May 19, 2026 •

edited

Loading