Broader specialization in the Specializing Adaptive Interpreter for better JIT performance

Until now, our choice of specialization in the SAI has been driven by performance of the interpreter alone https://github.com/python/cpython/blob/main/InternalDocs/interpreter.md#performance-analysis.

However, we now expect any further performance improvements to be provided by the JIT, not the interpreter.
This means that specializations other role, that of gathering type and branching information for the JIT, is at least as important as pure interpreter performance.

We should therefore seek to broaden specialization to gather more information, as long as it does not make interpreter performance worse, or at least no significantly so.

Using some old stats, by fraction of unspecialized bytecode executed, the top 10 were:
BINARY_OP 	31.3%
FOR_ITER 	19.4%
LOAD_ATTR 	10.9%
STORE_SUBSCR 	9.2%
BINARY_SLICE 	7.3%
COMPARE_OP 	7.0%
TO_BOOL 	5.8%
CALL 	2.5%
CONTAINS_OP 	2.4%
SEND 	1.7%

We should fully specialize most, if not all, of these.

In general, the above instructions have a matching `__dunder__` method which determines the behavior of the operation. Recording the type of the operand(s) allows us to know what `__dunder__` method is to be called.

We cannot specialize for all possible types, but we can ensure we have good inputs and type information for the JIT by adding the following two specializations for all families of instructions:

* `__dunder__` implemented in Python. Most of the above instructions have a matching `__dunder__` method. These specializations should jump directly into the method. `LOAD_ATTR_GETATTRIBUTE_OVERRIDDEN` already does this for `LOAD_ATTR`. Other families should follow this template.
* `__dunder__` implemented in C. In practice, this is just the generic instruction with a bit more information recorded.

Three instructions need special casing:
* BINARY_OP. Because the behavior depends on two types, we will need a table driven approach: https://github.com/python/cpython/issues/100239
* BINARY_SLICE. This is supposed to avoid creating temporary slice objects for expressions like `a[b:c]` but has yet to be implemented properly. There is no corresponding `__dunder__` method, so we would need to expose slicing methods to use.
* SEND. There is no `__send__` method. For iterators, `__next__` is called if the value is `None`, otherwise `.send()` is called. Rather than try to replicate the specializations of `FOR_ITER` we should maybe look to combine `SEND` and `FOR_ITER` much like we did for `CALL` and `CALL_METHOD`

### First step

Add two specializations for `__dunder__` in Python and the fallback `__dunder__` in C for:
* FOR_ITER
* LOAD_ATTR
* STORE_SUBSCR
* COMPARE_OP
* TO_BOOL
* CALL
* CONTAINS_OP

For a total of 12 new instructions as `LOAD_ATTR` already has the specialization for the Python `__getattribute__` and `CALL` already has the generic fallback.

### Second step

Implement https://github.com/python/cpython/issues/100239

### Third step

Handle `BINARY_SLICE` and `SEND`



### Linked PRs
* gh-148113
* gh-148128
* gh-148271
* gh-148745
* gh-148963

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Broader specialization in the Specializing Adaptive Interpreter for better JIT performance #143732

First step

Second step

Third step

Linked PRs

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Broader specialization in the Specializing Adaptive Interpreter for better JIT performance #143732

Description

First step

Second step

Third step

Linked PRs

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions