We're using codspeed with the GitHub Actions integration, running benchmarks with in simulation, which, if I understand correctly, is meant to reduce variance. However, we currently have to re-run workflows a couple of times to get stable results, because they seem to always be off when they're not run in the same environment as the baseline benchmarks, which codspeed also mentions in its report with the message
Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.
This feels like a stark limitation of the usefulness of this tool, given my understanding that the CPU simulation is specifically meant to address this type of variance.
You can see such a run here: msgspec/msgspec#1052 (comment).
This is our current setup: https://github.com/msgspec/msgspec/blob/978c6671814524da8b748630a8ea142c56956c94/.github/workflows/codspeed.yml
Am I missing something, or are we doing something wrong with our setup?
We're using codspeed with the GitHub Actions integration, running benchmarks with in
simulation, which, if I understand correctly, is meant to reduce variance. However, we currently have to re-run workflows a couple of times to get stable results, because they seem to always be off when they're not run in the same environment as the baseline benchmarks, which codspeed also mentions in its report with the messageThis feels like a stark limitation of the usefulness of this tool, given my understanding that the CPU simulation is specifically meant to address this type of variance.
You can see such a run here: msgspec/msgspec#1052 (comment).
This is our current setup: https://github.com/msgspec/msgspec/blob/978c6671814524da8b748630a8ea142c56956c94/.github/workflows/codspeed.yml
Am I missing something, or are we doing something wrong with our setup?