Fix `EMAModel.restore()` foreach path crashing with device mismatch when model is on GPU by Dev-X25874 · Pull Request #13782 · huggingface/diffusers

Dev-X25874 · 2026-05-21T09:25:54Z

What does this PR do?

Fixes a runtime crash in EMAModel.restore() when foreach=True and the model lives on a non-CPU device (e.g. CUDA).

store() always saves parameters to CPU (param.detach().cpu().clone()). The foreach path in restore() then passed those raw CPU tensors directly to torch._foreach_copy_(), which requires all tensors to be on the same device:

# before (broken on GPU)
torch._foreach_copy_(
    [param.data for param in parameters],
    [c_param.data for c_param in self.temp_stored_params],  # always CPU
)

This raises RuntimeError: Expected all tensors to be on same device for any user who calls the standard EMA validation pattern (store → copy_to → restore) with foreach=True on a GPU machine.

The fix mirrors the pattern already used correctly in copy_to()'s foreach path (line 780), which moves each shadow param to the target device before the copy:

# after (matches copy_to() pattern)
torch._foreach_copy_(
    [param.data for param in parameters],
    [c_param.to(param.device).data for c_param, param in zip(self.temp_stored_params, parameters)],
)

Also adds test_store_restore to both EMAModelTests and EMAModelTestsForeach — the store/restore round-trip was completely untested prior to this PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul

…ice mismatch on GPU

…n-foreach EMAModel

sayakpaul · 2026-05-22T09:12:57Z

How can I minimally reproduce the bug?

Dev-X25874 · 2026-05-22T09:29:17Z

Hi @sayakpaul, here's a minimal repro (requires a CUDA GPU):

import torch
import torch.nn as nn
from diffusers.training_utils import EMAModel

model = nn.Linear(4, 4).cuda()
ema = EMAModel(model.parameters(), foreach=True)

# Simulate a training step so shadow params differ from model params
with torch.no_grad():
    for p in model.parameters():
        p.add_(torch.randn_like(p))
ema.step(model.parameters())

# Standard EMA validation pattern
ema.store(model.parameters())     # saves to CPU
ema.copy_to(model.parameters())  # works fine
ema.restore(model.parameters())  # RuntimeError: Expected all tensors to be on same device

The crash happens because store() always clones params to CPU, but the foreach path in restore() feeds those raw CPU tensors into torch._foreach_copy_() alongside the GPU model params. The non-foreach path is unaffected since copy_() handles cross-device copies natively.

sayakpaul · 2026-05-22T09:31:38Z

+            assert torch.allclose(restored.data, original.to(restored.device), atol=1e-6), (
+                "restore() foreach path did not correctly recover the stored parameters"
+            )
+    def test_store_restore(self):


This seems duplicated.

My bad, the foreach test_store_restore was mistakenly placed inside EMAModelTests instead of EMAModelTestsForeach. Fixed now.

…n GPU

sayakpaul · 2026-05-22T09:55:09Z

How come this works?

import torch

gpu = [torch.zeros(3, device="cuda")]
cpu = [torch.arange(3, dtype=torch.float32)]
torch._foreach_copy_(gpu, cpu)
print(gpu[0])

Prints:

tensor([0., 1., 2.], device='cuda:0')

Dev-X25874 · 2026-05-22T09:58:52Z

Sorry for the noise, converting this to draft while I investigate further.

Dev-X25874 · 2026-05-22T10:41:36Z

Verified on PyTorch 2.10.0+cu128 — no crash. You're right, torch._foreach_copy_() handles cross-device fine on modern PyTorch. Closing this PR. Sorry for the noise.

Dev-X25874 added 2 commits May 21, 2026 14:20

training_utils: fix EMAModel.restore() foreach path crashing with dev…

1cfb2e6

…ice mismatch on GPU

tests/ema: add store/restore round-trip tests for both foreach and no…

0f3528c

…n-foreach EMAModel

github-actions Bot added tests size/S PR with diff < 50 LOC labels May 21, 2026

sayakpaul reviewed May 22, 2026

View reviewed changes

training_utils: fix EMAModel.restore() foreach path device mismatch o…

8781439

…n GPU

github-actions Bot added size/M PR with diff < 200 LOC and removed size/S PR with diff < 50 LOC labels May 22, 2026

Dev-X25874 added 3 commits May 22, 2026 15:18

tests/ema: fix duplicate test_store_restore — place one in each class

9830f34

training_utils: remove accidental duplicate in restore() foreach fix

c72ccc5

tests/ema: remove duplicate test_store_restore from EMAModelTests

f308b2e

github-actions Bot added size/S PR with diff < 50 LOC and removed size/M PR with diff < 200 LOC labels May 22, 2026

Dev-X25874 marked this pull request as draft May 22, 2026 09:58

Dev-X25874 closed this May 22, 2026

Dev-X25874 deleted the fix/ema-restore-foreach-device-mismatch branch May 23, 2026 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `EMAModel.restore()` foreach path crashing with device mismatch when model is on GPU#13782

Fix `EMAModel.restore()` foreach path crashing with device mismatch when model is on GPU#13782
Dev-X25874 wants to merge 6 commits into
huggingface:mainfrom
Dev-X25874:fix/ema-restore-foreach-device-mismatch

Dev-X25874 commented May 21, 2026

Uh oh!

sayakpaul commented May 22, 2026

Uh oh!

Dev-X25874 commented May 22, 2026

Uh oh!

sayakpaul May 22, 2026

Uh oh!

Dev-X25874 May 22, 2026

Uh oh!

sayakpaul commented May 22, 2026 •

edited

Loading

Uh oh!

Dev-X25874 commented May 22, 2026

Uh oh!

Dev-X25874 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Dev-X25874 commented May 21, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

sayakpaul commented May 22, 2026

Uh oh!

Dev-X25874 commented May 22, 2026

Uh oh!

sayakpaul May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Dev-X25874 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dev-X25874 commented May 22, 2026

Uh oh!

Dev-X25874 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sayakpaul commented May 22, 2026 •

edited

Loading