Skip to content

gh-149180: avoid double checking tp_as_number, tp_as_sequence and tp_as_mapping#149476

Open
NekoAsakura wants to merge 1 commit into
python:mainfrom
NekoAsakura:gh-149180/avoid-double-checking
Open

gh-149180: avoid double checking tp_as_number, tp_as_sequence and tp_as_mapping#149476
NekoAsakura wants to merge 1 commit into
python:mainfrom
NekoAsakura:gh-149180/avoid-double-checking

Conversation

@NekoAsakura
Copy link
Copy Markdown
Contributor

@NekoAsakura NekoAsakura commented May 7, 2026

Benchmark results
main branch %
map_check_int 8.51 ns 7.95 ns -6.5%
map_check_obj 8.50 ns 8.00 ns -6.0%
seq_check_obj 8.32 ns 7.92 ns -4.8%
seq_check_int 8.32 ns 7.94 ns -4.5%
seq_check_tuple 8.21 ns 7.94 ns -3.3%
seq_check_list 8.22 ns 8.01 ns -2.5%
map_check_dict 8.12 ns 7.93 ns -2.4%
map_check_tuple 8.09 ns 7.93 ns -1.9%
getitem_dict 12.92 ns 12.68 ns -1.9%
mul_str_int 14.74 ns 14.53 ns -1.4%
mul_int_str 14.70 ns 14.51 ns -1.3%
map_check_list 8.06 ns 7.96 ns -1.2%
mul_int_list 16.11 ns 15.98 ns -0.8%
mul_list_int 15.87 ns 15.76 ns -0.7%
getitem_list 3.55 ns 3.53 ns -0.4%
getitem_tuple 3.45 ns 3.44 ns -0.3%
getitem_str 3.51 ns 3.50 ns -0.1%
bool_none 6.78 ns 6.77 ns -0.1%
bool_dict 7.49 ns 7.50 ns +0.2%
bool_list 7.51 ns 7.54 ns +0.4%
seq_check_dict 8.32 ns 8.39 ns +0.9%
bool_obj 7.03 ns 7.19 ns +2.3%
bool_int 7.66 ns 7.96 ns +3.8%
Geomean −1.0%
Benchmark scripts

Build flags: ./configure --enable-experimental-jit --enable-optimizations --with-lto

import pyperf

r = pyperf.Runner()
I = "from _testlimitedcapi import "

for n, s in [("list", "v=[1]"), ("dict", "v={1:2}"), ("int", "v=5"),
             ("none", "v=None"), ("obj", "v=object()")]:
    r.timeit(f"bool_{n}", "bool(v)", s)

for n, s in [("list", "v=[1]"), ("tuple", "v=(1,)"), ("int", "v=5"),
             ("dict", "v={1:2}"), ("obj", "v=object()")]:
    r.timeit(f"seq_check_{n}", "f(v)", I + "sequence_check as f;" + s)
    r.timeit(f"map_check_{n}", "f(v)", I + "mapping_check as f;" + s)

for n, s in [("list", "v=[1,2,3]"), ("tuple", "v=(1,2,3)"),
             ("dict", "v={0:'a',1:'b'}"), ("str", "v='abc'")]:
    r.timeit(f"getitem_{n}", "v[1]", s)

for n, s in [("int_list", "a=2;b=[1]"), ("list_int", "a=[1];b=2"),
             ("int_str", "a=2;b='ab'"), ("str_int", "a='ab';b=2")]:
    r.timeit(f"mul_{n}", "a*b", s)

P.S.
Different build flags might affect the benchmarks, but I haven't had the time to test them yet.
I did see a regression on PyMapping_Check matching @picnixz's concern(#149317 (comment)) w/ --enable-experimental-jit (no PGO/LTO) build only, but I suspect my testing at the time wasn't sufficiently rigorous. Will take a closer look later.

@read-the-docs-community
Copy link
Copy Markdown

Documentation build overview

📚 cpython-previews | 🛠️ Build #32576493 | 📁 Comparing 8cefc49 against main (b2582a6)

  🔍 Preview build  

40 files changed · ± 39 modified · - 1 deleted

± Modified

- Deleted

@picnixz
Copy link
Copy Markdown
Member

picnixz commented May 7, 2026

Can you remove the JIT as well and only do PGO/LTO please?

@NekoAsakura
Copy link
Copy Markdown
Contributor Author

Can you remove the JIT as well and only do PGO/LTO please?

main branch %
getitem_dict 18.46 ns 17.96 ns -2.7%
mul_list_int 33.71 ns 33.05 ns -2.0%
mul_int_str 31.88 ns 31.37 ns -1.6%
seq_check_tuple 18.09 ns 17.84 ns -1.4%
mul_str_int 31.96 ns 31.53 ns -1.4%
bool_int 20.24 ns 19.97 ns -1.3%
seq_check_int 18.00 ns 17.83 ns -0.9%
mul_int_list 34.08 ns 33.76 ns -0.9%
getitem_list 11.54 ns 11.44 ns -0.9%
seq_check_list 18.08 ns 17.97 ns -0.6%
seq_check_obj 17.92 ns 17.81 ns -0.6%
seq_check_dict 17.98 ns 17.95 ns -0.2%
bool_list 20.12 ns 20.11 ns -0.0%
bool_obj 19.34 ns 19.35 ns +0.0%
bool_dict 20.03 ns 20.06 ns +0.1%
map_check_list 17.98 ns 18.03 ns +0.3%
map_check_int 17.98 ns 18.07 ns +0.5%
bool_none 18.77 ns 18.94 ns +0.9%
map_check_tuple 17.86 ns 18.03 ns +1.0%
map_check_obj 17.81 ns 18.05 ns +1.3%
getitem_tuple 11.16 ns 11.31 ns +1.3%
getitem_str 11.13 ns 11.44 ns +2.7%
map_check_dict 17.82 ns 18.44 ns +3.5%
Geomean -0.1%

Copy link
Copy Markdown
Contributor

@MojoVampire MojoVampire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a docs change for https://docs.python.org/3/c-api/typeobj.html, which currently claims the structs aren't set on PyBaseObject_Type or PyType_Type, and are not defaulted (it might make sense to add a new code for the default column to indicate that yes, it's defaulted, to a fully NULL set of values).

Also, just to be clear (comment is there as well), type_ready's docs say "Still, not all type objects go through PyType_Ready." If that's true, then we can't actually guarantee non-null tp_as_(number|sequence|mapping) slots, right? Unless the CheckConsistency checks are guaranteed to be hit at some point? Those CHECK checks go to ASSERT macros/functions that, despite the name, are never removed on NDEBUG builds, so they might protect us?

Comment thread Objects/typeobject.c
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a place where replacing the != NULL check for base overrides with _PyType_HasOwnNumberMethods(base) would be a good idea, just so all subtypes with operator overloads aren't performing 35 COPYNUM ops (involving ~70 branches) per non-overloaded MRO base when we could know up front whether or not the parent type has any number methods that might need to be inherited. No, the cost isn't paid often, but given that literally every newly defined type includes object in its MRO, that means the definition of every class with a single operator overload from the set of 35 (e.g. nb_bool) now goes through ~70 extra branches for object (and another ~70 for numbers.Number if they directly inherit rather than register as a virtual subclass). Could conceivably harm startup time given how many types include some overload.

Comment thread Objects/typeobject.c
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for tp_as_number (just less critical, since there are fewer slots here).

Comment thread Objects/typeobject.c
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for tp_as_number (just less critical, since there are fewer slots here).

Comment thread Objects/typeobject.c
@@ -9303,6 +9230,7 @@ type_ready_inherit(PyTypeObject *type)
if (base != NULL) {
type_ready_inherit_as_structs(type, base);
}
type_ready_install_method_tables(type);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a comment in type_ready that says "Still, not all type objects go through PyType_Ready." Is that comment still accurate? Because if so, then:

  1. Any internal types that skip PyType_Ready will need this changed manually, and
  2. If there are external types that skip PyType_Ready (I'd hope not, the current docs say it should be done for all types, but I'm not sure how far back that requirement goes), they're going to violate the "tp_as_XXX is never NULL" constraint and segfault when tested for any of these features, and make me worried about the fate of this change entirely (though I suppose we could just say "Fix your extensions" to low-side packages that aren't calling PyType_Ready)

Comment thread Python/specialize.c
@@ -1503,8 +1503,9 @@ store_subscr_fail_kind(PyObject *container, PyObject *sub)
{
PyTypeObject *container_type = Py_TYPE(container);
PyMappingMethods *as_mapping = container_type->tp_as_mapping;
if (as_mapping && (as_mapping->mp_ass_subscript
== PyDict_Type.tp_as_mapping->mp_ass_subscript)) {
if (as_mapping->mp_ass_subscript
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably combine these lines at this point (yes, it's technically >79 characters, but only just, and the source code regularly goes over that limit by a few characters; ~5% of all lines are 80+ characters), and it feels more readable testing on one line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants