Skip to content

gh-130273: Fix traceback color output with unicode characters#142529

Merged
ambv merged 11 commits intopython:mainfrom
grayjk:issue-130273
Apr 7, 2026
Merged

gh-130273: Fix traceback color output with unicode characters#142529
ambv merged 11 commits intopython:mainfrom
grayjk:issue-130273

Conversation

@grayjk
Copy link
Copy Markdown
Contributor

@grayjk grayjk commented Dec 10, 2025

Account for the display width of unicode characters so that colors and underlining in traceback output is correct

Closes #130273

@python-cla-bot
Copy link
Copy Markdown

python-cla-bot bot commented Dec 10, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@vstinner
Copy link
Copy Markdown
Member

@serhiy-storchaka: Here is a PR about text width and Unicode characters :-)

@grayjk
Copy link
Copy Markdown
Contributor Author

grayjk commented Jan 28, 2026

updated to use @serhiy-storchaka's recently added unicodedata.iter_graphemes

@grayjk
Copy link
Copy Markdown
Contributor Author

grayjk commented Feb 18, 2026

@pablogsal @hauntsaninja as recent reviewers of traceback.py, would you mind taking look

@StanFromIreland

This comment was marked as resolved.

@grayjk

This comment was marked as resolved.

@StanFromIreland
Copy link
Copy Markdown
Member

There are conflicts again I'm afraid, and mypy isn't happy either.

@pablogsal

This comment was marked as outdated.

@pablogsal

This comment was marked as outdated.

@Wulian233
Copy link
Copy Markdown
Contributor

@pablogsal You maybe make wrong something :)

@ambv
Copy link
Copy Markdown
Contributor

ambv commented Apr 7, 2026

Thanks for your PR.

I reverted the move of string-handling utils to traceback, they don't belong there
Now those utils are copied into traceback temporarily so that we can backport the fixes to older bugfix branches (3.13 and 3.14). In main I will do a follow-up change where those utils are moved to string/__init__.py where they belong.

This is because mypy type-checking introduction to the standard library relies on introduction of fully type-annotated libraries. I intend to keep it that way and therefore sprinkling type: ignore is not a solution.

It is now out of scope to format all of traceback.py for 3.15 while string.py is just three small classes and a function (two more functions if we count string.templatelib). When I make this change, I will also undo the unicodedata conditional import shenanigans and use lazy imports. That's also why this will be a main-only change.

There's also a related behavioral change in _display_width that I feel is part of this fix. It introduces handling of pre-existing terminal escape sequences in the exception strings (as well as any stranded ^Z characters but those are admittedly unlikely to be part of tracebacks).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes misaligned color highlighting/underlining in traceback output when the source line contains wide Unicode characters or grapheme clusters, by switching caret/column calculations to use display width rather than raw character offsets.

Changes:

  • Introduces grapheme-aware zipping between source characters and caret markers to keep colored segments aligned with terminal display width.
  • Reworks _display_width() to use _wlen()/_str_width() logic (including CTRL-Z and ANSI escape sequence handling) and adds targeted tests for wide/combining/ASCII edge cases.
  • Adds a NEWS fragment documenting the fix.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
Misc/NEWS.d/next/Library/2025-12-10-15-15-09.gh-issue-130273.iCfiY5.rst Announces the traceback color alignment fix for Unicode characters.
Lib/traceback.py Implements grapheme/display-width-aware caret-to-text alignment and updates width calculation helpers.
Lib/test/test_traceback.py Adds regression tests for colorized tracebacks with wide/combining Unicode and ASCII display-width edge cases.
Lib/_pyrepl/utils.py Minor whitespace-only change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ambv ambv merged commit dfeb160 into python:main Apr 7, 2026
64 checks passed
@ambv ambv added awaiting review needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Apr 7, 2026
@miss-islington-app
Copy link
Copy Markdown

Thanks @grayjk for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link
Copy Markdown

Thanks @grayjk for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link
Copy Markdown

Sorry, @grayjk and @ambv, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker dfeb160bc35f0ba16800d07b85cb11598d1cd307 3.14

@miss-islington-app
Copy link
Copy Markdown

Sorry, @grayjk and @ambv, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker dfeb160bc35f0ba16800d07b85cb11598d1cd307 3.13

@ambv
Copy link
Copy Markdown
Contributor

ambv commented Apr 7, 2026

There's some trivial conflicts that can be solved here, but the real problem is that unicodedata.iter_graphemes is new for 3.15 and a pure Python reimplementation is close to 200 lines.

WDYT, @serhiy-storchaka, is it worth creating a pure Python version of this segmentation purely for the backports?

@ambv
Copy link
Copy Markdown
Contributor

ambv commented Apr 7, 2026

To help you decide, I created #148218 so you can see what the implementation would look like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Traceback colors are shifted when the line contains wide unicode characters

7 participants