Problem
Three High-severity findings block near-complete cleanup of claude-code-chat-browser:
-
Search crashes on bad limit — api/search.py calls int(request.args.get("limit", 50)) without validation. Values such as abc, 1.5, or "" raise ValueError and surface as HTTP 500 instead of a client error. The UI sends a valid integer today, but manual curl calls and future clients can trigger the crash.
-
No structural types for data shapes — Sessions, messages, projects, search hits, stats, and error bodies are built and returned as ad-hoc plain dicts across utils/ and api/. That makes API contracts implicit, complicates review, and leaves no foundation for static checking.
-
No static type checking in CI — The repo has no mypy configuration or CI gate. Regressions in return shapes and handler annotations are caught only at runtime (if at all).
These items are the type-safety foundation for later work (integration tests, structured error codes, API reference docs). They must land before broader test and documentation coverage.
Goal
Land three deliverables in order:
- Validate the search
limit parameter and return 400 with a clear JSON error for invalid input.
- Introduce a
models/ package with TypedDict (wire/JSON shapes) and @dataclass only where mutable domain objects are warranted; annotate public boundaries with no runtime behavior change.
- Add
mypy with strict = true, fix all reported errors against the new types, and run mypy in CI so the suite stays green.
Scope
A — Search endpoint crash fix (1 pt)
Touch points: api/search.py, new tests/test_search.py
- Add input parsing for
limit (inline helper or utils/query_params.py if reuse is preferred later).
- Rules:
"50" → 50 (integer string OK)
"1.5" → 400 (int() rejects float strings)
"abc" → 400
- missing or empty → default 50
- optional: reject
limit < 1 with 400
- Return JSON consistent with existing handlers, e.g.
{"error": "Invalid limit: must be a positive integer"} with status 400 (plain error string is fine; structured code field is follow-up work).
- Add unit tests using the same Flask
test_client fixture pattern as tests/test_error_propagation.py:
| Test |
Request |
Assert |
test_limit_integer_string |
?q=hello&limit=10 |
200, JSON list |
test_limit_float_string |
?q=hello&limit=1.5 |
400, body mentions limit |
test_limit_non_numeric |
?q=hello&limit=abc |
400 |
test_limit_default |
?q=hello (no limit) |
200 |
test_empty_query |
?q= |
200, [] (preserve existing behavior) |
Root cause reference: api/search.py line ~20 — unvalidated int() on query param.
B — TypedDict / dataclass audit (5 pt)
No runtime behavior change — structural types, annotations at boundaries, replace ad-hoc dict literals where types are defined. Do not change JSON field names or response shapes.
Primary shapes to type (public boundaries first):
| Shape |
Source |
Suggested type |
| Session (parsed) |
utils/jsonl_parser.py — parse_session() |
SessionDict (TypedDict) |
| SessionMetadata |
nested in session metadata |
TypedDict |
| Message |
messages[] in parser |
MessageDict |
| Project |
utils/session_path.py — list_projects() |
ProjectDict |
| Session list row |
list_sessions() |
SessionListItemDict |
| Search hit |
api/search.py |
SearchHitDict |
| Session stats |
utils/session_stats.py — compute_stats() |
SessionStatsDict |
| Error body |
{"error": "..."} across APIs |
ErrorResponse (TypedDict) |
| Export state |
export_state_store.py, export_api.py |
TypedDict for on-disk JSON |
Defer deep typing inside _parse_tool_result dispatch arms unless mypy forces it.
Create package layout:
claude-code-chat-browser/
models/
__init__.py # re-export public types
session.py # Session, Message, Metadata
project.py # ProjectDict, SessionListItemDict
search.py # SearchHitDict
stats.py # SessionStatsDict
errors.py # ErrorResponse
Convention:
TypedDict — JSON-serializable wire shapes (API responses, jsonify payloads).
@dataclass — only for mutable domain objects with methods; prefer TypedDict + cast() where dicts are built incrementally.
Annotate file-by-file (run pytest after each):
| File |
Change |
utils/jsonl_parser.py |
def parse_session(filepath: str) -> SessionDict: |
utils/session_path.py |
def list_projects(...) -> list[ProjectDict]: |
utils/session_stats.py |
def compute_stats(session: SessionDict) -> SessionStatsDict: |
api/search.py |
results: list[SearchHitDict] |
api/projects.py |
enriched project dicts and session list rows |
api/sessions.py |
return type annotations on handlers |
api/export_api.py |
export state dict at read/write boundaries |
C — Mypy strict + CI (3 pt)
Depends on Block B (types give mypy something to check).
- Add
mypy==1.15.0 (or team-standard 1.x pin) to requirements-dev.txt.
- Create
pyproject.toml with [tool.mypy]:
[tool.mypy]
python_version = "3.12"
strict = true
packages = ["api", "utils", "models", "scripts"]
exclude = ["tests/"]
[[tool.mypy.overrides]]
module = "tests.*"
ignore_errors = true
- Resolve all
mypy errors before opening the PR (mypy from repository root).
- Add a CI job in
.github/workflows/ci.yml (dedicated mypy job or step after pytest):
mypy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip
cache-dependency-path: requirements-dev.txt
- run: pip install -r requirements-dev.txt
- run: mypy
Common fixes: Flask handler return types (tuple[Response, int] | Response), TypedDict partial builds (total=False or cast() at return), set[str] in metadata, types-Flask stubs where needed. Prefer fixing scripts/export.py over broad overrides; document any narrow module override in the PR.
Out of Scope
- Structured
ErrorCode enum and machine-readable code on every error response
- Full HTTP route / CLI test matrix
- Frontend or Playwright integration tests
- API reference or contribution/architecture documentation
Acceptance Criteria
Search limit validation
Typed models
Mypy CI
Overall checkpoint
Suggested Validation
pip install -r requirements-dev.txt
pytest tests/test_search.py -v
pytest -q
mypy
Manual smoke (optional): start the app, then verify search with valid limit returns 200, invalid limit returns 400 JSON, and /api/projects responds.
Prod install smoke (matches CI):
pip install -r requirements.txt
python -c "from app import create_app; c=create_app().test_client(); assert c.get('/').status_code==200"
Slip Order (if overloaded)
Non-negotiable: search limit fix and core Session / SearchHit types.
Can slip if needed: full mypy package list — start with api/ + models/ first, widen before integration test work.
Do not slip: search fix; structural types for parser and search API boundaries.
Notes
- All three items are High severity (1 + 5 + 3 pt).
- Follow-on work assumes: predictable search 400 for bad
limit, typed shapes for integration test assertions, and mypy CI guarding models imports.
- Rebase on
master before starting if dependent XSS / frontend split work is not merged yet.
Problem
Three High-severity findings block near-complete cleanup of
claude-code-chat-browser:Search crashes on bad
limit—api/search.pycallsint(request.args.get("limit", 50))without validation. Values such asabc,1.5, or""raiseValueErrorand surface as HTTP 500 instead of a client error. The UI sends a valid integer today, but manualcurlcalls and future clients can trigger the crash.No structural types for data shapes — Sessions, messages, projects, search hits, stats, and error bodies are built and returned as ad-hoc plain
dicts acrossutils/andapi/. That makes API contracts implicit, complicates review, and leaves no foundation for static checking.No static type checking in CI — The repo has no
mypyconfiguration or CI gate. Regressions in return shapes and handler annotations are caught only at runtime (if at all).These items are the type-safety foundation for later work (integration tests, structured error codes, API reference docs). They must land before broader test and documentation coverage.
Goal
Land three deliverables in order:
limitparameter and return 400 with a clear JSON error for invalid input.models/package withTypedDict(wire/JSON shapes) and@dataclassonly where mutable domain objects are warranted; annotate public boundaries with no runtime behavior change.mypywithstrict = true, fix all reported errors against the new types, and runmypyin CI so the suite stays green.Scope
A — Search endpoint crash fix (1 pt)
Touch points:
api/search.py, newtests/test_search.pylimit(inline helper orutils/query_params.pyif reuse is preferred later)."50"→50(integer string OK)"1.5"→ 400 (int()rejects float strings)"abc"→ 400limit < 1with 400{"error": "Invalid limit: must be a positive integer"}with status 400 (plainerrorstring is fine; structuredcodefield is follow-up work).test_clientfixture pattern astests/test_error_propagation.py:test_limit_integer_string?q=hello&limit=10test_limit_float_string?q=hello&limit=1.5test_limit_non_numeric?q=hello&limit=abctest_limit_default?q=hello(no limit)test_empty_query?q=[](preserve existing behavior)Root cause reference:
api/search.pyline ~20 — unvalidatedint()on query param.B — TypedDict / dataclass audit (5 pt)
No runtime behavior change — structural types, annotations at boundaries, replace ad-hoc dict literals where types are defined. Do not change JSON field names or response shapes.
Primary shapes to type (public boundaries first):
utils/jsonl_parser.py—parse_session()SessionDict(TypedDict)metadataTypedDictmessages[]in parserMessageDictutils/session_path.py—list_projects()ProjectDictlist_sessions()SessionListItemDictapi/search.pySearchHitDictutils/session_stats.py—compute_stats()SessionStatsDict{"error": "..."}across APIsErrorResponse(TypedDict)export_state_store.py,export_api.pyTypedDictfor on-disk JSONDefer deep typing inside
_parse_tool_resultdispatch arms unlessmypyforces it.Create package layout:
Convention:
TypedDict— JSON-serializable wire shapes (API responses,jsonifypayloads).@dataclass— only for mutable domain objects with methods; preferTypedDict+cast()where dicts are built incrementally.Annotate file-by-file (run
pytestafter each):utils/jsonl_parser.pydef parse_session(filepath: str) -> SessionDict:utils/session_path.pydef list_projects(...) -> list[ProjectDict]:utils/session_stats.pydef compute_stats(session: SessionDict) -> SessionStatsDict:api/search.pyresults: list[SearchHitDict]api/projects.pyapi/sessions.pyapi/export_api.pyC — Mypy strict + CI (3 pt)
Depends on Block B (types give
mypysomething to check).mypy==1.15.0(or team-standard 1.x pin) torequirements-dev.txt.pyproject.tomlwith[tool.mypy]:mypyerrors before opening the PR (mypyfrom repository root)..github/workflows/ci.yml(dedicatedmypyjob or step afterpytest):Common fixes: Flask handler return types (
tuple[Response, int] | Response),TypedDictpartial builds (total=Falseorcast()at return),set[str]in metadata,types-Flaskstubs where needed. Prefer fixingscripts/export.pyover broad overrides; document any narrow module override in the PR.Out of Scope
ErrorCodeenum and machine-readablecodeon every error responseAcceptance Criteria
Search limit validation
limitvalues (abc,1.5, etc.) return 400 with a clear JSONerrormessage, not 500.limitbehave as before (default 50).tests/test_search.pycovers integer string, float string, non-numeric string, default, and empty query.Typed models
models/package exists with__init__.pyre-exports.parse_session,list_projects,compute_stats, and search results are typed at public boundaries.pytestsuite remains green (especiallytests/test_jsonl_parser.py).Mypy CI
mypy --strictpasses onapi,utils,models, andscripts.mypyjob that passes on PR.requirements-dev.txtandpyproject.tomlcommitted.Overall checkpoint
mypygreen in CI.pytest -qgreen in CI.Suggested Validation
Manual smoke (optional): start the app, then verify search with valid
limitreturns 200, invalidlimitreturns 400 JSON, and/api/projectsresponds.Prod install smoke (matches CI):
pip install -r requirements.txt python -c "from app import create_app; c=create_app().test_client(); assert c.get('/').status_code==200"Slip Order (if overloaded)
Non-negotiable: search limit fix and core
Session/SearchHittypes.Can slip if needed: full
mypypackage list — start withapi/+models/first, widen before integration test work.Do not slip: search fix; structural types for parser and search API boundaries.
Notes
limit, typed shapes for integration test assertions, andmypyCI guardingmodelsimports.masterbefore starting if dependent XSS / frontend split work is not merged yet.