Skip to content

Commit ea7f6a7

Browse files
committed
Merge TASK-065: RFC 5952 IPv6 canonicalization in peer_address::to_string
2 parents e530ae3 + ce191c9 commit ea7f6a7

6 files changed

Lines changed: 391 additions & 30 deletions

File tree

specs/tasks/M7-v2-cleanup/TASK-065.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@
88
`peer_address.cpp:49-50` skips RFC 5952 zero-compression (collapsing consecutive zero groups to `::`) so the textual IPv6 form we expose is non-canonical. Spec comment notes "TASK-046 can refine when telemetry firms up" — telemetry has firmed up; finish the canonicalization now.
99

1010
**Action Items:**
11-
- [ ] Implement RFC 5952 §4 canonical form: lowercase, suppress leading zeros within groups, collapse the longest run of `2+` consecutive zero groups to `::` (ties broken by first occurrence), do not collapse a single zero group.
12-
- [ ] Reuse libc's `inet_ntop` where it already produces RFC 5952 output (glibc ≥ 2.28, musl, macOS recent) and only post-process when the platform falls short; or always post-process for determinism across platforms (preferred — document choice in commit message).
13-
- [ ] Add a unit test pinning the RFC 5952 §4.2.2 examples (`2001:db8::1`, `::1`, `::`, embedded IPv4 mappings).
14-
- [ ] Remove the "TASK-046 can refine when telemetry firms up" comment.
11+
- [x] Implement RFC 5952 §4 canonical form: lowercase, suppress leading zeros within groups, collapse the longest run of `2+` consecutive zero groups to `::` (ties broken by first occurrence), do not collapse a single zero group.
12+
- [x] Reuse libc's `inet_ntop` where it already produces RFC 5952 output (glibc ≥ 2.28, musl, macOS recent) and only post-process when the platform falls short; or always post-process for determinism across platforms (preferred — document choice in commit message).
13+
- [x] Add a unit test pinning the RFC 5952 §4.2.2 examples (`2001:db8::1`, `::1`, `::`, embedded IPv4 mappings).
14+
- [x] Remove the "TASK-046 can refine when telemetry firms up" comment.
1515

1616
**Dependencies:**
1717
- Blocked by: None
@@ -27,4 +27,4 @@
2727
**Related Requirements:** PRD §2 observability NFR
2828
**Related Decisions:** None new (RFC 5952)
2929

30-
**Status:** Backlog
30+
**Status:** Done

specs/tasks/M7-v2-cleanup/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ TASK-093).
2929
| TASK-062 | RFC-7616-compliant Digest auth response factory | HIGH | L | Done |
3030
| TASK-063 | Honor or remove `http_response::pipe` `size_hint` parameter | HIGH | S | Done |
3131
| TASK-064 | Structured cookie type | HIGH | M | Backlog |
32-
| TASK-065 | RFC 5952 IPv6 zero-compression in `peer_address` | HIGH | S | Backlog |
32+
| TASK-065 | RFC 5952 IPv6 zero-compression in `peer_address` | HIGH | S | Done |
3333
| TASK-066 | Runtime setter for hook alias slots | MED | M | Backlog |
3434
| TASK-067 | Remove v1 `registered_resources*` maps and `namespace compat` shim | MED | L | Backlog |
3535
| TASK-068 | `connection_state` hardening — CWE-226 / CWE-14 | MED | S | Backlog |
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Unworked Review Issues
2+
3+
**Run:** 2026-06-05 16:57:44
4+
**Task:** TASK-065
5+
**Total:** 25 (0 critical, 1 major, 24 minor)
6+
7+
## Major
8+
9+
1. [ ] **code-quality-reviewer** | `src/peer_address.cpp:106` | code-elegance
10+
append_group_hex is defined but never called. The iter-2 rewrite inlined its logic directly into emit_canonical (the local scratch[5] + manual copy loop at lines 125-128), so append_group_hex is now unreachable dead code. Any compiler with -Wunused-function will warn; more importantly it is pure noise that contradicts the stated goal of the rewrite.
11+
*Recommendation:* Remove the append_group_hex function entirely (lines 106-112 in the current file). The inlined scratch-buffer approach in emit_canonical is already the correct replacement.
12+
13+
## Minor
14+
15+
2. [ ] **architecture-alignment-checker** | `test/unit/peer_address_to_string_test.cpp:27` | pattern-violation
16+
The test includes `./httpserver.hpp` and `./littletest.hpp` with relative `./` prefix, whereas other unit tests in this codebase use the same convention. This is consistent locally but slightly unconventional (most C++ style guides omit the `./`). No functional impact.
17+
*Recommendation:* Optional: change to `"httpserver.hpp"` and `"littletest.hpp"` to match the broader project include-path convention. Low priority.
18+
19+
3. [ ] **code-quality-reviewer** | `src/peer_address.cpp:107` | code-readability
20+
`append_group_hex` takes `std::string& out` by non-const reference as a side-effect output parameter. Per the clean-code Functions Rules (prefer fewer arguments, avoid side effects), a function that returns a value is more composable and easier to reason about than one that mutates an argument.
21+
*Recommendation:* Consider making `append_group_hex` return a `std::string` (or a `char[5]` view) and let `emit_canonical` concatenate the result. Given that `out.reserve(40)` is already done, the extra string object would normally elide under NRVO. Alternatively, rename it `append_hex_group_to` so the side-effecting intent is explicit in the name.
22+
23+
4. [ ] **code-quality-reviewer** | `src/peer_address.cpp:109` | code-readability
24+
The block comment above emit_canonical (lines 109-114) says 'We write into a fixed stack buffer (max 39 chars for a fully-expanded all-non-zero address) and construct the returned std::string from (buf, len). This lets short compressed addresses like "::1" (3 chars) or "2001:db8::1" (11 chars) stay within SBO on all common implementations without any reserve() call that would defeat it.' This accurately describes the iter-2 implementation. However, the comment on ipv6_canonical (line 138-139) still says 'The output is bounded by INET6_ADDRSTRLEN (45 + NUL = 46). emit_canonical() uses a stack buffer to avoid defeating SBO for the typical compressed short addresses' which is correct but slightly redundant with the emit_canonical comment — minor duplication, not harmful.
25+
*Recommendation:* No change strictly required; the comments are accurate. Optionally trim the emit_canonical forward-reference on the ipv6_canonical comment to a single sentence to avoid redundancy.
26+
27+
5. [ ] **code-quality-reviewer** | `src/peer_address.cpp:125` | code-readability
28+
char scratch[5] is declared inside the for loop body (inside the else branch). While valid C++, declaring a fixed-size buffer inside an inner loop that will iterate up to 8 times can be surprising to readers expecting declarations at function scope. The variable is used only within that iteration, so correctness is not at issue, but grouping it with other local variable declarations (pos) at function entry would improve readability.
29+
*Recommendation:* Move the scratch[5] declaration to the top of emit_canonical alongside pos, before the loop, to keep all working-buffer declarations together.
30+
31+
6. [ ] **code-quality-reviewer** | `src/peer_address.cpp:22` | code-readability
32+
The file-header comment (lines 21-23) references TASK-051 as the motivation for carving this TU out. This is useful historical context but mixes task-tracking identifiers with functional documentation, which ages poorly as tasks become distant history.
33+
*Recommendation:* Optionally reframe the comment to state the functional invariant ('This TU is intentionally free of backend-platform headers; see hook_context.hpp') rather than the task number, keeping the latter in the git log.
34+
35+
7. [ ] **code-quality-reviewer** | `src/peer_address.cpp:88` | code-readability
36+
`cur_start` is declared and initialized to -1 at the top of `find_longest_zero_run`, then overwritten unconditionally when `cur_len == 0`. The outer-scope initialization to -1 is vestigial because the variable is always assigned before it is read (the `if (cur_len == 0) cur_start = i` guard fires on the first zero group). Placing the declaration closer to first use would eliminate the stale initialization.
37+
*Recommendation:* Declare `cur_start` inside the `if (cur_len == 0)` branch or as `int cur_start = i;` at that point to make the data-flow self-documenting. This removes a misleading -1 sentinel that is never actually observed.
38+
39+
8. [ ] **code-quality-reviewer** | `test/unit/peer_address_to_string_test.cpp:60` | test-coverage
40+
Tests cover all RFC 5952 normative examples and edge cases well. One gap: there is no test for a fully-expanded all-non-zero address (e.g. 'fe80:1:2:3:4:5:6:7') which exercises the no-collapse path of emit_canonical end-to-end with a non-trivial output. The existing rfc5952_single_zero_no_collapse test has a zero group and still exercises find_longest_zero_run, but a zero-run-free address isolates the straight-through path.
41+
*Recommendation:* Add one test case with all eight groups non-zero (e.g. fe80:1:2:3:4:5:6:7) asserting the fully-colon-separated output to pin the no-collapse branch.
42+
43+
9. [ ] **code-quality-reviewer** | `test/unit/peer_address_to_string_test.cpp:98` | test-coverage
44+
The `rfc5952_first_occurrence_tie_break` test groups are `1:0:0:1:0:0:1:1`, giving runs at positions [1..2] and [4..5] both of length 2. This validates first-occurrence tie-break for the case where the two runs are separated by a non-zero group. There is no test for two adjacent zero-runs of equal length separated only by a non-zero singleton (e.g. `0:0:1:0:0` — both runs length 2 with a gap of 1). This is a minor coverage gap; the existing test is sufficient for the RFC citation but would not catch an off-by-one in the tie-break if the winning run abuts the separator group.
45+
*Recommendation:* Consider adding one extra case such as `2001:0:0:1:0:0:2:1` (first run at [1..2] wins, second at [4..5]) to widen the tie-break coverage without bloating the suite.
46+
47+
10. [ ] **code-simplifier** | `src/peer_address.cpp:117` | naming
48+
`emit_canonical` advances `i` in two different places inside the same loop: via `i += collapse.len` (inside the `if` branch with `continue`) and via `++i` (in the normal path). The asymmetric advancement makes the loop harder to parse at a glance.
49+
*Recommendation:* Consider restructuring the loop to use a single increment point, or extract the collapse branch into a helper. A comment at the top of the loop body explicitly stating 'i is advanced inside each branch' would also help readers.
50+
51+
11. [ ] **code-simplifier** | `src/peer_address.cpp:128` | code-structure
52+
The manual character-copy loop `for (char* p = scratch; *p; ++p) buf[pos++] = *p;` is the only place in the file that does not delegate character-output to snprintf/assignment. It is correct and fast, but it reads more like C idiom than the surrounding C++ style and the intent (copy digits into buf at pos) is not immediately obvious.
53+
*Recommendation:* Replace with `std::size_t written = std::snprintf(&buf[pos], sizeof(buf) - pos, "%x", static_cast<unsigned>(g[i])); pos += written;` — consistent with the snprintf-cursor pattern used everywhere else in the file, eliminates the scratch array and the loop, and makes the advance of pos explicit. The buf[40] size guarantee (max 39 chars) keeps this safe.
54+
55+
12. [ ] **code-simplifier** | `src/peer_address.cpp:43` | code-structure
56+
The `zero_run` struct carries two 'no-collapse' signals: `start = -1` and `len = 0`. Only `len` is actually tested in the caller (`best.len < 2` at line 102) and in `emit_canonical` (which relies on `i == collapse.start` never matching when `start == -1`). Having both fields convey the same sentinel state adds unnecessary cognitive load.
57+
*Recommendation:* Remove the `start = -1` default and replace it with `start = 0`. The sentinel contract should be expressed solely through `len`: a comment like `// len == 0 means no eligible run` is sufficient. This makes it unambiguous that `start` is only meaningful when `len >= 2`.
58+
59+
13. [ ] **code-simplifier** | `src/peer_address.cpp:87` | naming
60+
In find_longest_zero_run, the local variables `cur_start` and `cur_len` shadow the meaning of the `best` struct fields `start` and `len`. A reader must hold both names in mind at once to follow the update logic.
61+
*Recommendation:* Rename to `run_start` and `run_len` to disambiguate from the `best.start` / `best.len` fields they feed into. The distinction becomes clearer: run_* tracks the active run, best.* tracks the committed winner.
62+
63+
14. [ ] **code-simplifier** | `src/peer_address.cpp:88` | code-structure
64+
`cur_start` is initialised to `-1` but this value is never observed: it is always overwritten by the `if (cur_len == 0) cur_start = i;` guard before any read occurs. The misleading initialiser implies the sentinel is checked somewhere, when it is not.
65+
*Recommendation:* Declare `cur_start` without an explicit sentinel to make it clear the value is always set before use, or add a brief comment explaining why the initialiser is irrelevant: `int cur_start = 0; // always overwritten before use`
66+
67+
15. [ ] **performance-reviewer** | `src/peer_address.cpp:109` | missing-caching
68+
append_group_hex() calls snprintf once per non-collapsed IPv6 group. In the worst case (fully-expanded address, no collapse) this is 8 snprintf calls per to_string() invocation. snprintf carries format-string parsing overhead that does not amortize across calls. On the access-log hot path (one call per HTTP request) this overhead is noticeable relative to a table-driven approach.
69+
*Recommendation:* Replace the snprintf in append_group_hex with a 16-entry nibble lookup table: static constexpr char kHex[] = "0123456789abcdef"; then emit the 1-4 significant hex nibbles of the uint16 with at most 4 array lookups and direct char appends. This eliminates format-string parsing entirely and reduces the hot-path cost from O(snprintf_overhead * groups) to O(nibbles).
70+
71+
16. [ ] **performance-reviewer** | `src/peer_address.cpp:125` | memory-allocation
72+
Each group in emit_canonical() snprintf's into a local scratch[5] and then copies char-by-char into buf. This adds up to 8 snprintf calls and 8 copy loops per call. A single snprintf(buf+pos, sizeof(buf)-pos, "%x", group) followed by pos += strlen(buf+pos) would write directly into the destination buffer with no intermediate allocation.
73+
*Recommendation:* Replace the scratch[5] + char-by-char copy with: int n = std::snprintf(buf + pos, sizeof(buf) - pos, "%x", static_cast<unsigned>(g[i])); pos += static_cast<std::size_t>(n); This eliminates the scratch buffer and copy loop entirely while keeping the stack-only approach.
74+
75+
17. [ ] **performance-reviewer** | `src/peer_address.cpp:80` | memory-allocation
76+
format_ipv4_mapped() returns std::string{buf} where buf is a C-string (NUL-terminated). The std::string constructor from const char* must call strlen to determine the length, then copy. The length is already known exactly as the return value of snprintf at line 75.
77+
*Recommendation:* Capture the snprintf return value: int n = std::snprintf(buf, sizeof(buf), ...); and construct return std::string(buf, static_cast<std::size_t>(n)); This avoids the strlen scan and is trivially safe because snprintf is bounded by sizeof(buf). Apply the same pattern to the IPv4 snprintf at line 165.
78+
79+
18. [ ] **performance-reviewer** | `src/peer_address.cpp:80` | memory-allocation
80+
format_ipv4_mapped() constructs std::string{buf} from a NUL-terminated pointer, which requires an internal strlen scan. The output length is statically bounded (max 22 chars, always within SBO), but the strlen can be avoided by using the (buf, n) constructor after capturing the snprintf return value.
81+
*Recommendation:* Capture the snprintf return: int n = std::snprintf(buf, sizeof(buf), "::ffff:%u.%u.%u.%u", ...); return std::string(buf, static_cast<std::size_t>(n)); This avoids the strlen and directly constructs the string from the known length.
82+
83+
19. [ ] **security-reviewer** | `src/peer_address.cpp:116` | insecure-design
84+
buf[40] is the only non-annotated 'magic number' in the function. All other buffers carry an inline comment explaining the worst-case arithmetic (buf[16], buf[24], buf[5]). Adding a matching comment on buf[40] would let reviewers verify correctness at a glance without re-deriving the bound from the RFC (8 groups × 4 hex chars + 7 colons = 39 + NUL slot). This is a documentation gap, not an exploitable defect.
85+
*Recommendation:* Add a comment on the buf[40] declaration: // 8*4 hex + 7 colons = 39 + 1 slack; NUL not needed (constructed with explicit length)
86+
87+
20. [ ] **security-reviewer** | `src/peer_address.cpp:119` | error-handling
88+
out.reserve(40) is documented as 'bounded by INET6_ADDRSTRLEN' but INET6_ADDRSTRLEN is 46 (includes NUL). The maximum uncompressed IPv6 string is 39 chars ('ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff') so reserve(40) is technically correct but the comment is misleading. The IPv4-mapped path goes through a separate char buf and does not use this std::string reserve, so there is no overflow — but the comment could cause a future maintainer to mistakenly reduce the reserve.
89+
*Recommendation:* Update the comment to 'bounded by 39 chars (8 groups x 4 hex + 7 colons)' to match the actual maximum, or use INET6_ADDRSTRLEN-1 (45) as a conservative value.
90+
91+
21. [ ] **security-reviewer** | `src/peer_address.cpp:74` | error-handling
92+
buf[24] for IPv4-mapped is sized to exactly fit '::ffff:255.255.255.255\0' (23+1=24 bytes). While correct and safe with snprintf, there is zero slack — any future format change (e.g. adding a scope-ID or port suffix) would require careful resizing. The comment correctly documents the size but a +1 sentinel byte is a common defensive practice.
93+
*Recommendation:* Consider buf[25] or buf[INET6_ADDRSTRLEN] (46) to give explicit slack. Not a safety bug with the current format string, but reduces future regression risk.
94+
95+
22. [ ] **spec-alignment-checker** | `src/peer_address.cpp:null` | specification-gap
96+
The task description referred to 'TASK-046 can refine when telemetry firms up' as a comment at lines 49-50 of the pre-existing file on the feature/v2.0 base branch. The comment is absent from the new implementation (correctly removed), but the file is wholly new on this branch rather than an edit to a pre-existing one — git shows it as a new file mode 100644 with no prior state on master. The removal is therefore implicit in the rewrite rather than an explicit deletion, but the intent is fulfilled.
97+
*Recommendation:* No action required; the comment no longer exists in the codebase. This is purely an observational note about how the removal manifested in git.
98+
99+
23. [ ] **test-quality-reviewer** | `test/unit/peer_address_to_string_test.cpp:106` | missing-test
100+
Only one IPv4-mapped value is tested (::ffff:192.0.2.1). The is_ipv4_mapped() predicate only checks groups[0..5]; groups[6] and groups[7] are unrestricted. A boundary test for ::ffff:0.0.0.0 (the lowest IPv4-mapped address) and ::ffff:255.255.255.255 (the highest) would pin that format_ipv4_mapped() formats bytes[12..15] correctly at both ends of the dotted-quad range.
101+
*Recommendation:* Add one boundary test for ::ffff:0.0.0.0 (all-zero dotted quad) using bytes {0,0,0,0,0,0,0,0,0,0,0xff,0xff,0,0,0,0}. The upper boundary (255.255.255.255) is optional but useful. This pins the snprintf format string for all four byte fields.
102+
103+
24. [ ] **test-quality-reviewer** | `test/unit/peer_address_to_string_test.cpp:54` | missing-test
104+
No test covers an IPv6 address with all eight groups non-zero (e.g. 2001:db8:1:2:3:4:5:6). This path exercises emit_canonical with collapse.start==-1 (zero_run{} sentinel) and zero_run.len==0, verifying that no '::' is emitted when no qualifying run exists. The documentation example (rfc5952_documentation_example) always exercises the collapse branch; the no-collapse branch is left implicitly trusted.
105+
*Recommendation:* Add a test for a fully non-zero address such as 2001:db8:1:2:3:4:5:6 asserting the output is '2001:db8:1:2:3:4:5:6'. This directly pins the emit_canonical no-collapse code path.
106+
107+
25. [ ] **test-quality-reviewer** | `test/unit/peer_address_to_string_test.cpp:75` | naming-convention
108+
rfc5952_all_zero tests the all-zero IPv6 address (::), but the family is peer_address::family::ipv6, not peer_address::family::unspec. The test name and comment ('unspecified all-zero address') conflate two different concepts: the IPv6 unspecified address (0::0 == '::') tested here, and the peer_address::family::unspec family tested in unspec_returns_empty. The naming ambiguity could mislead a future maintainer who assumes 'all-zero' means family::unspec.
109+
*Recommendation:* Rename to rfc5952_ipv6_unspecified_address or rfc5952_full_collapse and update the comment to clarify this is the IPv6 in6addr_any / '::' address (distinct from peer_address::family::unspec).

0 commit comments

Comments
 (0)