Unstable results in GitHub actions due to "Different runtime environments detected"

We're using codspeed with the GitHub Actions integration, running benchmarks with in `simulation`, which, if I understand correctly, is meant to reduce variance. However, we currently have to re-run workflows a couple of times to get stable results, because they seem to always be off when they're not run in the same environment as the baseline benchmarks, which codspeed also mentions in its report with the message 

> Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

This feels like a stark limitation of the usefulness of this tool, given my understanding that the CPU simulation is specifically meant to address this type of variance.

You can see such a run here: https://github.com/msgspec/msgspec/pull/1052#issuecomment-4721981264.

This is our current setup: https://github.com/msgspec/msgspec/blob/978c6671814524da8b748630a8ea142c56956c94/.github/workflows/codspeed.yml

<hr>

Am I missing something, or are we doing something wrong with our setup? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable results in GitHub actions due to "Different runtime environments detected" #409

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Unstable results in GitHub actions due to "Different runtime environments detected" #409

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions