perf: add multiplexing performance tests for AsyncMultiRangeDownloader#16501
perf: add multiplexing performance tests for AsyncMultiRangeDownloader#16501zhixiangli wants to merge 1 commit intogoogleapis:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the GCS read microbenchmarks and configuration to test multiplexing by executing multiple download_ranges calls concurrently. The review identifies a critical TypeError where an unsupported lock argument was passed to download_ranges, and recommends removing the asyncio.Lock entirely as it would prevent true multiplexing. Additionally, the feedback suggests a more robust round-robin chunking strategy to ensure the desired number of concurrent tasks and requests the reorganization of standard library imports to follow PEP 8 guidelines.
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
62e59ba to
e00a201
Compare
643ba06 to
90999b4
Compare
90999b4 to
d35e824
Compare
| start_time = time.monotonic() | ||
| warmup_end_time = start_time + params.warmup_duration | ||
| test_end_time = warmup_end_time + params.duration | ||
| shared_lock = asyncio.Lock() |
There was a problem hiding this comment.
The new changes in #16528 do not require it. However, with the previous implementation, I feel safer passing the lock in, especially since multiple threads might create the lock (see https://github.com/googleapis/google-cloud-python/blob/main/packages/google-cloud-storage/google/cloud/storage/asyncio/async_multi_range_downloader.py#L400-L401).
We can clean this up as part of the process to deprecate the lock in download_ranges.
Overview
This PR introduces new microbenchmarks to measure and expose the performance bottleneck caused by lock contention in the
AsyncMultiRangeDownloader. It provides a concrete way to compare the previous serialized implementation against the new multiplexed architecture.Before vs. After: The Performance Gap
Before (Serialized via Lock)
In the previous implementation,
download_rangesused a shared lock to prevent concurrent access to the bidi-gRPC stream. This meant that even with multiple coroutines, only one could "own" the stream at a time. The entire download cycle (Send -> Receive All) had to complete before another task could start.Execution Flow:
sequenceDiagram participant C1 as Coroutine 1 participant C2 as Coroutine 2 participant S as gRPC Stream C1->>C1: Acquire Lock C1->>S: Send Requests S-->>C1: Receive Data (Streaming...) S-->>C1: End of Range C1->>C1: Release Lock Note over C2: Waiting for Lock... C2->>C2: Acquire Lock C2->>S: Send Requests S-->>C2: Receive Data (Streaming...) S-->>C2: End of Range C2->>C2: Release LockAfter (Multiplexed Concurrent)
With the introduction of the
_StreamMultiplexer, multiple coroutines can now share the same stream concurrently. Requests are interleaved, and a background receiver loop routes incoming data to the correct task usingread_id.Execution Flow:
sequenceDiagram participant C1 as Coroutine 1 participant C2 as Coroutine 2 participant M as Multiplexer participant S as gRPC Stream C1->>M: Send Requests M->>S: Forward Req 1 C2->>M: Send Requests M->>S: Forward Req 2 Note over C1,C2: Tasks wait on their own queues S-->>M: Data for C1 M-->>C1: Route to Q1 S-->>M: Data for C2 M-->>C2: Route to Q2 S-->>M: Data for C1 M-->>C1: Route to Q1How the Benchmark Works
This PR adds a
read_rand_multi_coroworkload that:AsyncMultiRangeDownloaderinstance across all tasks.shared_locktodownload_ranges.Key Changes
test_reads.py: Refactored to support launching concurrent coroutines within a single worker process.config.yaml: Addedread_rand_multi_corowith 1, 16 coroutines to stress the downloader.config.py: Updated naming convention to include coroutine count (e.g.,16c) in reports for easier differentiation.