Reduce FLUX int8 test peak memory with sequential offload by jiqing-feng · Pull Request #13776 · huggingface/diffusers

jiqing-feng · 2026-05-21T02:01:49Z

Summary

Update the slow FLUX bitsandbytes int8 tests to use sequential CPU offload instead of model CPU offload.

enable_model_cpu_offload() can move an entire sub-model onto the GPU at once. For black-forest-labs/FLUX.1-dev, this can OOM on <=24 GB cards even when the T5 encoder and transformer are loaded from the pre-quantized int8 test checkpoint. Sequential CPU offload keeps peak memory lower by materializing one layer at a time, which lets the int8 FLUX tests run in more constrained environments.

The LoRA-loading assertion tolerance is also relaxed from 1e-3 to 2e-3 to account for small backend-specific numerical differences observed in the slow int8 path.

Changes

Switch SlowBnb8bitFluxTests setup from enable_model_cpu_offload() to enable_sequential_cpu_offload().
Document why sequential offload is needed for the FLUX int8 slow tests.
Relax the test_lora_loading cosine-distance tolerance to 2e-3.

Validation

Run the affected slow tests:

RUN_SLOW=1 python -m pytest \
  tests/quantization/bnb/test_mixed_int8.py::SlowBnb8bitFluxTests::test_quality \
  tests/quantization/bnb/test_mixed_int8.py::SlowBnb8bitFluxTests::test_lora_loading \
  -x -s

Observed result:

2 passed

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2026-05-21T02:08:30Z

require change: huggingface/accelerate#4044 merged.

jiqing-feng added 3 commits May 20, 2026 14:12

fix oom

6d36ba9

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

revert

53e0a7c

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

adjust tol

862eb67

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

github-actions Bot added tests size/S PR with diff < 50 LOC labels May 21, 2026

jiqing-feng changed the title ~~Fix OOM on int8 tests~~ Reduce FLUX int8 test peak memory with sequential offload May 21, 2026

Merge branch 'main' into test_xpu

1ee339d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce FLUX int8 test peak memory with sequential offload#13776

Reduce FLUX int8 test peak memory with sequential offload#13776
jiqing-feng wants to merge 4 commits into
huggingface:mainfrom
jiqing-feng:test_xpu

jiqing-feng commented May 21, 2026 •

edited

Loading

Uh oh!

jiqing-feng commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiqing-feng commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation

Uh oh!

jiqing-feng commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jiqing-feng commented May 21, 2026 •

edited

Loading