refactor(search): decompose 520-line monolithic search handler into focused per-source functions (closes #99) by bradjin8 · Pull Request #102 · cppalliance/cppa-cursor-browser

bradjin8 · 2026-06-10T17:54:35Z

Summary

Decomposes the 450-line search() route handler in api/search.py into
three independently testable, single-responsibility functions extracted into
services/search.py. Eliminates the manual conn.close() patterns flagged
in the baseline eval and removes duplicated workspace-mapping logic that was
re-implemented inline.

Changes

`services/search.py` — new module

Three public functions, one per data source:

Function	Data source
`search_global_storage()`	`cursorDiskKV` in globalStorage (primary)
`search_legacy_workspaces()`	per-workspace `ItemTable` chat key (legacy fallback)
`search_cli_sessions()`	JSONL blob files in `~/.cursor/chats/`
`rank_results()`	merges and sorts all results by timestamp

Private helpers extracted from the inline logic:

_extract_snippet() — context window around a query match (pure)
_find_match() — checks title then bubble texts, returns (bool, snippet) (pure)
_build_search_bubble_map() — bubbleId:* row loader with schema validation
_build_ws_id_to_name() — workspace display-name map using canonical get_workspace_display_name

`api/search.py` — 522 → 50 lines

Route handler now delegates entirely to the service layer:

results.extend(search_global_storage(...))
results.extend(search_legacy_workspaces(...))
if search_type == "all":
    results.extend(search_cli_sessions(...))
return jsonify(parse_warnings.attach_to({"results": rank_results(results)}))

SQLite resource management

Both conn = None; try: … finally: conn.close() patterns replaced:

Global DB: uses existing open_global_db() context manager from services/workspace_db.py
Per-workspace legacy DBs: with closing(sqlite3.connect(…)) as conn:

Duplication eliminated

Workspace-entry collection, composer-ID-to-workspace mapping, and folder
display-name extraction now call collect_workspace_entries(),
build_composer_id_to_workspace_id(), and get_workspace_display_name()
from their canonical locations instead of re-implementing the logic inline.

`tests/test_search_helpers.py` — 27 new unit tests

Each extracted function has its own test class:

TestExtractSnippet (5 cases) — pure snippet logic
TestFindMatch (6 cases) — match detection, title-before-bubble priority
TestRankResults (4 cases) — int and ISO 8601 timestamps, empty list, missing key
TestSearchGlobalStorage (5 cases) — hit, miss, required keys, missing DB, display name
TestSearchLegacyWorkspaces (5 cases) — hit, miss, type guard, required keys, no legacy data
TestSearchCliSessions (2 cases) — empty dir, nonexistent path

`tests/test_models_wired_at_read_sites.py`

Updated two spy tests to target services.search.Bubble and the
services.search logger, where those symbols now live after the split.

Acceptance criteria

Search handler decomposed into 3+ focused functions (one per data source)
Each extracted function independently testable with clear input/output
SQLite connections use context managers — no manual close() calls
Search functionality unchanged (same API contract, same results)
At least one unit test per extracted function
443 tests passing, 0 failing

References

Closes Decompose 520-line monolithic search handler #99
Eval finding: Test 4, cluster "cognitive-load" (baseline eval)
Related: services/workspace_db.py (reused collect_workspace_entries, build_composer_id_to_workspace_id, open_global_db)

Summary by CodeRabbit

Refactor
- Search backend reorganized into dedicated readers for global storage, legacy workspaces, and CLI sessions with centralized result merging and ranking.
Bug Fixes / Improvements
- More consistent exclusion/matching and snippet extraction, improved timestamp normalization and result ordering, and richer parse/source warnings surfaced in responses.
Tests
- Expanded unit and integration tests for multi-source search, matching/snippets, and ranking; CI updated to run the new tests.

coderabbitai · 2026-06-10T17:54:49Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Extracts search logic from api/search.py into services/search.py (three readers: global storage, legacy workspaces, CLI sessions), adds helpers and rank_results, simplifies the route to orchestrate these services, extends parse-warnings handling, and adds/updates tests and CI selection.

Changes

Search Handler Decomposition

Layer / File(s)	Summary
Service layer helpers and search implementations `services/search.py`	Implements `search_global_storage`, `search_legacy_workspaces`, `search_cli_sessions`, helper functions for exclusion/searchable text, `_extract_snippet`/`_find_match`, workspace display mapping, bubble map building, and `rank_results` for timestamp-descending ordering.
API route handler refactoring `api/search.py`	Route handler now initializes `ParseWarningCollector`, computes `query_lower`, delegates to the three service readers (CLI reader called only when `search_type == "all"`), aggregates results, and returns `jsonify(parse_warnings.attach_to({"results": rank_results(results)}))`. Imports were reduced accordingly.
ParseWarningCollector: source failure tracking `models/parse_warnings.py`	Adds `source_failures` list, `record_source_failure(exc, source)` method, includes `source_failure` items in `to_api_list()`, and updates dataclass field initialization.
Test coverage for search layer `tests/test_search_helpers.py`, `tests/test_models_wired_at_read_sites.py`	Adds unit tests for `_extract_snippet`, `_find_match`, and `rank_results`, integration-style tests that seed SQLite global/workspace/CLI DBs and validate each search reader's output shape/behavior, and updates an existing test to patch/log from `services.search` instead of `api/search`.
CI test selection `.github/workflows/tests.yml`	Adds `tests/test_search_helpers.py` to the pytest job so the new tests run in CI.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant search_route as search() route
  participant svc_global as services.search.search_global_storage
  participant svc_legacy as services.search.search_legacy_workspaces
  participant svc_cli as services.search.search_cli_sessions
  participant rank as services.search.rank_results

  Client->>search_route: GET /api/search?query=Q&type=T
  search_route->>svc_global: (workspace_path, query, query_lower, rules, parse_warnings)
  svc_global-->>search_route: results
  search_route->>svc_legacy: (workspace_path, query, query_lower, search_type, rules)
  svc_legacy-->>search_route: results
  alt type == "all"
    search_route->>svc_cli: (cli_chats_path, query, query_lower, rules)
    svc_cli-->>search_route: results
  end
  search_route->>rank: rank_results(merged_results)
  rank-->>search_route: sorted results
  search_route->>Client: JSON response with results + parse_warnings

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

cppalliance/cppa-cursor-browser#78: Also surfaces parse-failure metadata via ParseWarningCollector/attach_to(...) around search results.

Suggested reviewers

clean6378-max-it
wpak-ai
timon0305

Poem

🐰 I hopped through code, split one big task in two,

Global, legacy, CLI — each finds what is true.
Snippets and ranks, warnings kept in view,
Tests planted seeds, CI now runs them through.
A tidy burrow, and a carrot for you.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR addresses core objectives from `#99` but has unresolved issues: timestamp normalization bug causes incorrect cross-source ordering, missing CLI path guard risks 500 errors, and new tests not executed by CI.	Fix timestamp normalization to use epoch-ms consistently, add get_cli_chats_path() None check, include tests/test_search_helpers.py in .github/workflows/tests.yml, and add mixed-type timestamp test.
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main refactoring: decomposing a monolithic 520-line search handler into focused per-source functions, which aligns with the primary change in the changeset.
Out of Scope Changes check	✅ Passed	All changes directly support the decomposition objective: new services/search.py module with extracted functions, refactored api/search.py route handler, updated tests, and CI workflow modifications are in-scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/decompose-monolithic-search-handler

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/test_search_helpers.py (1)
453-472: ⚡ Quick win

Consider adding a test with seeded CLI session data.

The CLI session tests only cover empty/nonexistent directory cases. Unlike TestSearchGlobalStorage and TestSearchLegacyWorkspaces, there are no tests that seed actual CLI session data (JSONL files + blob stores) and verify matching logic. Adding at least one test that creates a minimal session with a matching query would improve coverage and confidence in the CLI search path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_helpers.py` around lines 453 - 472, Add a new test in the
TestSearchCliSessions class that seeds a minimal CLI session (create a session
JSONL file and corresponding blob store entries) under the
tmp_workspace_root["cli_root"], then call search_cli_sessions(...) with a query
that should match the seeded session and assert the returned results include the
expected session ID/metadata; reference the existing TestSearchCliSessions test
methods and the search_cli_sessions function to locate where to add the test and
ensure the seeded files mimic the real CLI session JSONL/blob structure used by
the code.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_search_helpers.py`:
- Around line 453-472: Add a new test in the TestSearchCliSessions class that
seeds a minimal CLI session (create a session JSONL file and corresponding blob
store entries) under the tmp_workspace_root["cli_root"], then call
search_cli_sessions(...) with a query that should match the seeded session and
assert the returned results include the expected session ID/metadata; reference
the existing TestSearchCliSessions test methods and the search_cli_sessions
function to locate where to add the test and ensure the seeded files mimic the
real CLI session JSONL/blob structure used by the code.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5b02a724-2201-4902-8c90-8c5c93d9d723

📥 Commits

Reviewing files that changed from the base of the PR and between 937dc91 and 6e29cf8.

📒 Files selected for processing (4)

api/search.py
services/search.py
tests/test_models_wired_at_read_sites.py
tests/test_search_helpers.py

clean6378-max-it · 2026-06-10T20:37:00Z

Should fix

.github/workflows/tests.yml:117, tests/test_search_helpers.py — add tests/test_search_helpers.py to the CI pytest invocation (or convert tests to unittest.TestCase so unittest discover picks them up) (new search unit tests are never executed in the CI matrix, undermining the PR's verification goal)

tests/test_search_helpers.py:453 — add a positive-match test for search_cli_sessions using mocked list_cli_projects / traverse_blobs / messages_to_bubbles (same pattern as tests/test_cli_tabs.py) (issue #102 acceptance criteria require at least one unit test per extracted function verifying core search logic; only empty-path cases exist today)

Nice to have

services/search.py:122 — remove unused workspace_path parameter from _build_ws_id_to_name (dead parameter adds noise; workspace_entries alone is sufficient)

tests/test_search_helpers.py:25 — drop unused import _build_exclusion_searchable (imported but never referenced)

clean6378-max-it · 2026-06-10T20:39:26Z

@CodeRabbit. Except my finding, Can you also run deep review yourself?

coderabbitai · 2026-06-10T20:41:53Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the main new files in this PR
echo "=== services/search.py ==="
cat services/search.py

echo ""
echo "=== api/search.py ==="
cat api/search.py

echo ""
echo "=== tests/test_search_helpers.py ==="
cat tests/test_search_helpers.py

echo ""
echo "=== .github/workflows/tests.yml ==="
cat .github/workflows/tests.yml

echo ""
echo "=== tests/test_models_wired_at_read_sites.py (relevant section) ==="
grep -n "services.search\|api.search\|Bubble\|search" tests/test_models_wired_at_read_sites.py | head -60

Length of output: 53857

@clean6378-max-it — sure, here is my own deep-dive review on top of your findings.

Deep Review – PR `#102` (`services/search.py` + friends)

🔴 Bugs

1. `rank_results` — epoch-milliseconds vs epoch-seconds mismatch causes cross-source ordering to always be wrong

search_global_storage and search_cli_sessions both produce integer epoch-ms timestamps (~~1.715 × 10¹²). search_legacy_workspaces produces an ISO string ("2024-01-01T00:00:00Z"). The _ts helper converts ISO strings via .timestamp() to epoch-seconds (~~1.715 × 10⁹), but leaves integers untouched. Because 1.715e12 ≫ 1.715e9, every single composer/CLI result will always sort above every single legacy-chat result, regardless of actual recency.

# services/search.py – rank_results helper
def _ts(r: dict) -> float:
    t = r.get("timestamp", 0)
    if isinstance(t, str):
        try:
            return datetime.fromisoformat(t.replace("Z", "+00:00")).timestamp()
            #                                                         ^^^^^^^^^^
            #  epoch-seconds ≈ 1.7e9 — but composer ints are epoch-ms ≈ 1.7e12
        except Exception:
            return 0.0
    return float(t) if t else 0.0   # epoch-ms int passed through unchanged

Fix — normalise everything to epoch-seconds (or epoch-ms) before comparing:

def _ts(r: dict) -> float:
    t = r.get("timestamp", 0)
    if isinstance(t, str):
        try:
            return datetime.fromisoformat(t.replace("Z", "+00:00")).timestamp() * 1000
        except Exception:
            return 0.0
    return float(t) if t else 0.0   # already epoch-ms

The rank_results test only uses homogeneous-type inputs so this discrepancy is invisible in the test suite — a cross-type case should be added.

2. `api/search.py` — `get_cli_chats_path()` is not guarded for `None`

# api/search.py
if search_type == "all":
    results.extend(
        search_cli_sessions(get_cli_chats_path(), query, query_lower, rules)
        #                   ^^^^^^^^^^^^^^^^^^^^ may return None on platforms
        #                   without a CLI install; list_cli_projects(None) raises
    )

list_cli_projects will receive None and likely raise TypeError or FileNotFoundError, which is caught by the outer except Exception block and returns a 500. Adding a guard:

cli_path = get_cli_chats_path()
if search_type == "all" and cli_path:
    results.extend(search_cli_sessions(cli_path, query, query_lower, rules))

🟠 Design / Correctness Concerns

3. `search_legacy_workspaces` — `datetime.now().isoformat()` as a timestamp fallback is misleading

"timestamp": tab.get("lastSendTime") or datetime.now().isoformat(),

If lastSendTime is absent the conversation is stamped now, making it appear as the most-recently-active chat on every call. A None or 0 fallback (pinning it to the bottom of results) is more honest. Also datetime.now() returns a naïve datetime; datetime.now(tz=timezone.utc).isoformat() is safer for cross-platform consistency.

4. `search_global_storage` — outer exception silently converts any DB-level failure into an empty result set

except Exception:
    _logger.exception("Error searching global storage")
    # returns []  — caller receives no signal; parse_warnings is untouched

A corrupt or missing global DB returns [] with no API-level indication (no parse_warnings entry, no response field). Consider recording a warning so the client can distinguish "no results" from "search failed":

except Exception:
    _logger.exception("Error searching global storage")
    parse_warnings.record_source_failure("global_storage")   # or equivalent

5. `_build_ws_id_to_name` — `workspace_path` parameter is dead (also flagged by you)

Confirmed: the parameter is declared but the body only reads entry["workspaceJsonPath"], which already contains the absolute path. The parameter can be removed and all call sites updated.

🟡 Code Quality / Minor

6. `tests/test_search_helpers.py` line 25 — unused import (also flagged by you)

from services.search import (
    _build_exclusion_searchable,   # ← never referenced in any test
    ...
)

7. `rank_results` test suite — no cross-type (int + ISO string) case

All TestRankResults tests use homogeneous timestamp types. The epoch-ms/epoch-s bug above would pass every existing test. A mixed-type test should be added:

def test_mixed_epoch_ms_and_iso_string_sort_by_recency(self):
    # composer result from 2024-05 (epoch-ms) vs chat result from 2025-01 (ISO)
    results = [
        {"timestamp": 1_715_000_000_000, "type": "composer"},  # 2024-05
        {"timestamp": "2025-01-01T00:00:00Z",  "type": "chat"},   # 2025-01
    ]
    ranked = rank_results(results)
    assert ranked[0]["type"] == "chat"   # 2025 is more recent

8. `.github/workflows/tests.yml` — `test_search_helpers.py` excluded from pytest (also flagged by you)

The pytest step only lists two files:

run: python -m pytest tests/test_api_endpoints.py tests/test_pdf_export.py -v --tb=short

tests/test_search_helpers.py uses pure-pytest classes (no unittest.TestCase), so python -m unittest discover won't collect it either. The 27 new unit tests are never executed in CI. Add the file to the pytest invocation:

run: python -m pytest tests/test_api_endpoints.py tests/test_pdf_export.py tests/test_search_helpers.py -v --tb=short

9. `services/search.py` — no `all` to clearly declare the public surface

With four public functions and four private helpers all at module level, an explicit __all__ would clarify the contract and prevent accidental re-export:

__all__ = [
    "search_global_storage",
    "search_legacy_workspaces",
    "search_cli_sessions",
    "rank_results",
]

Summary Table

#	Severity	Location	Issue
1	🔴 Bug	`rank_results`	epoch-ms vs epoch-s cross-source ranking is always wrong
2	🔴 Bug	`api/search.py`	`get_cli_chats_path()` → `None` → crash on platforms without CLI
3	🟠 Design	`search_legacy_workspaces`	`datetime.now()` fallback stamps missing timestamps as "just now"
4	🟠 Design	`search_global_storage`	outer exception returns `[]` with no parse_warnings signal
5	🟡 Minor	`_build_ws_id_to_name`	unused `workspace_path` param (also your finding)
6	🟡 Minor	`test_search_helpers.py:25`	unused import `_build_exclusion_searchable` (also your finding)
7	🟡 Minor	`TestRankResults`	no mixed epoch-ms + ISO string test case (hides bug `#1`)
8	🟡 Minor	`tests.yml:117`	`test_search_helpers.py` not in pytest invocation (also your finding)
9	🟡 Minor	`services/search.py`	no `__all__` to mark public surface

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

services/search.py (1)
337-339: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Top-level DB failures are silently swallowed.

When the outer try block fails (e.g., workspace iteration or DB connection), an empty list is returned with no client-visible indication. parse_warnings is available but not used here, so callers cannot distinguish "no matches" from "search failed."

Consider recording a search-level warning so the API can surface it:
     except Exception:
         _logger.exception("Error searching global storage")
+        parse_warnings.record_source_failure("global_storage")
This requires adding a record_source_failure method to ParseWarningCollector (or a similar mechanism).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/search.py` around lines 337 - 339, The outer exception handler in
services/search.py currently swallows DB/workspace failures and only logs via
_logger.exception; add a ParseWarningCollector.record_source_failure(exception,
source="global_storage") method (or similarly named) to ParseWarningCollector
and call it inside that except Exception block with the caught exception and a
source identifier (e.g., "global_storage") so callers can see a search-level
warning; keep the existing _logger.exception call, populate the warning with the
exception message/trace, and ensure the function returns the same result shape
but includes the updated ParseWarningCollector so the API can surface the
failure.

🧹 Nitpick comments (2)

services/search.py (2)

571-572: 💤 Low value

Use ASCII x instead of Unicode × in comment.

Static analysis flags the multiplication sign as ambiguous. Use the ASCII letter x for clarity.

-                # .timestamp() → epoch-seconds; ×1000 → epoch-ms to match ints
+                # .timestamp() → epoch-seconds; x1000 → epoch-ms to match ints

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/search.py` around lines 571 - 572, Update the inline comment next to
the datetime conversion to replace the Unicode multiplication sign (×) with the
ASCII letter "x" so static analysis isn't confused; locate the comment
accompanying the return using datetime.fromisoformat(...).timestamp() * 1000 and
change "×1000 → epoch-ms" to "x1000 → epoch-ms" (only the comment text needs
modification).

Source: Linters/SAST tools

28-33: 💤 Low value

Sort __all__ to satisfy linter.

Static analysis flags that __all__ is not sorted. Sorting improves readability and avoids churn when adding entries.

Suggested fix

 __all__ = [
+    "rank_results",
+    "search_cli_sessions",
     "search_global_storage",
     "search_legacy_workspaces",
-    "search_cli_sessions",
-    "rank_results",
 ]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/search.py` around lines 28 - 33, The __all__ export list is
unsorted; reorder the entries in the __all__ list so they are alphabetically
sorted by name (e.g. "rank_results", "search_cli_sessions",
"search_global_storage", "search_legacy_workspaces") to satisfy the linter and
keep exports stable; update the __all__ variable in the module (the current list
containing search_global_storage, search_legacy_workspaces, search_cli_sessions,
rank_results) to the sorted ordering.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@services/search.py`:
- Around line 337-339: The outer exception handler in services/search.py
currently swallows DB/workspace failures and only logs via _logger.exception;
add a ParseWarningCollector.record_source_failure(exception,
source="global_storage") method (or similarly named) to ParseWarningCollector
and call it inside that except Exception block with the caught exception and a
source identifier (e.g., "global_storage") so callers can see a search-level
warning; keep the existing _logger.exception call, populate the warning with the
exception message/trace, and ensure the function returns the same result shape
but includes the updated ParseWarningCollector so the API can surface the
failure.

---

Nitpick comments:
In `@services/search.py`:
- Around line 571-572: Update the inline comment next to the datetime conversion
to replace the Unicode multiplication sign (×) with the ASCII letter "x" so
static analysis isn't confused; locate the comment accompanying the return using
datetime.fromisoformat(...).timestamp() * 1000 and change "×1000 → epoch-ms" to
"x1000 → epoch-ms" (only the comment text needs modification).
- Around line 28-33: The __all__ export list is unsorted; reorder the entries in
the __all__ list so they are alphabetically sorted by name (e.g. "rank_results",
"search_cli_sessions", "search_global_storage", "search_legacy_workspaces") to
satisfy the linter and keep exports stable; update the __all__ variable in the
module (the current list containing search_global_storage,
search_legacy_workspaces, search_cli_sessions, rank_results) to the sorted
ordering.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3a09388-2f3b-4e1d-bdf4-bc1b508e9008

📥 Commits

Reviewing files that changed from the base of the PR and between d08ca43 and dd5a01b.

📒 Files selected for processing (3)

.github/workflows/tests.yml
services/search.py
tests/test_search_helpers.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_search_helpers.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/parse_warnings.py`:
- Line 35: record_source_failure currently stores str(exc) in
self.source_failures and later returns it to clients via to_api_list, which can
leak internals; change record_source_failure (and the similar block around the
to_api_list usage at lines ~79-84) to log the full exception details to the
server log (using the existing logger) but append a sanitized object to
self.source_failures containing the source identifier plus a safe client_message
or error_code (e.g., "source_processing_error") instead of str(exc); update
to_api_list to return that sanitized client_message/error_code and not the raw
exception, while preserving the full exc text only in server logs for debugging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e5fb3ef-0276-4e48-afb3-be61832f535d

📥 Commits

Reviewing files that changed from the base of the PR and between dd5a01b and 5ce7702.

📒 Files selected for processing (2)

models/parse_warnings.py
services/search.py

🚧 Files skipped from review as they are similar to previous changes (1)

services/search.py

clean6378-max-it · 2026-06-10T23:45:02Z

I think @CodeRabbit. my all findings are addressed. also y our feedback are addressed.

coderabbitai · 2026-06-10T23:46:18Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== .github/workflows/tests.yml (pytest line) ==="
grep -n "pytest\|test_search" .github/workflows/tests.yml

echo ""
echo "=== services/search.py: __all__, _build_ws_id_to_name, datetime.now, rank_results/_ts, outer except ==="
grep -n "__all__\|_build_ws_id_to_name\|datetime\.now\|def _ts\|def rank_results\|record_source_failure\|except Exception" services/search.py | head -40

echo ""
echo "=== services/search.py: _ts function body ==="
grep -n -A 10 "def _ts" services/search.py

echo ""
echo "=== api/search.py: cli_chats_path guard ==="
grep -n "cli_chats_path\|get_cli_chats_path\|search_cli_sessions" api/search.py

echo ""
echo "=== tests/test_search_helpers.py: imports and cli positive test ==="
head -40 tests/test_search_helpers.py

echo ""
echo "=== tests/test_search_helpers.py: TestSearchCliSessions and TestRankResults ==="
grep -n "def test_\|class Test\|mixed\|epoch_ms\|iso\|ISO\|_build_exclusion" tests/test_search_helpers.py

Length of output: 6460

🏁 Script executed:

#!/bin/bash
echo "=== api/search.py full content (cli guard context) ==="
grep -n -B3 -A3 "search_cli_sessions\|get_cli_chats_path" api/search.py

echo ""
echo "=== _build_ws_id_to_name signature ==="
grep -n -A 5 "def _build_ws_id_to_name" services/search.py

echo ""
echo "=== search_legacy_workspaces timestamp fallback ==="
grep -n "lastSendTime\|datetime\.now\|timestamp" services/search.py | grep -A2 -B2 "lastSendTime"

echo ""
echo "=== test_mixed_epoch_ms test body ==="
grep -n -A 20 "def test_mixed_epoch_ms" tests/test_search_helpers.py

Length of output: 3209

@clean6378-max-it — nearly everything is addressed, great work! One item from Bug #2 remains open:

api/search.py Line 46 — get_cli_chats_path() is still not guarded for None

The current code:

if search_type == "all":
    results.extend(
        search_cli_sessions(get_cli_chats_path(), query, query_lower, rules)
    )

On platforms without a Cursor CLI install, get_cli_chats_path() returns None, which gets passed into search_cli_sessions → list_cli_projects(None) → crash → 500 response. The fix is one line:

cli_path = get_cli_chats_path()
if search_type == "all" and cli_path:
    results.extend(search_cli_sessions(cli_path, query, query_lower, rules))

Everything else is confirmed fixed:

#	Item	Status
1	CI pytest includes `test_search_helpers.py`	✅
2	`get_cli_chats_path()` None guard	❌ still open
3	`_build_ws_id_to_name` dead `workspace_path` param removed	✅
4	`rank_results` epoch-ms normalisation (`× 1000`)	✅
5	`search_global_storage` outer exception calls `record_source_failure`	✅
6	Unused `_build_exclusion_searchable` import removed	✅
7	Mixed epoch-ms + ISO string `TestRankResults` test	✅
8	Positive-match `TestSearchCliSessions` test	✅
9	`__all__` in `services/search.py`	✅
10	`search_legacy_workspaces` fallback timestamp changed to `0`	✅

Just the one-line guard left to add.

clean6378-max-it · 2026-06-10T23:52:14Z

line 46 in api/search.py won’t cause a 500 just because CLI chats aren’t installed or the folder doesn’t exist.
@CodeRabbit

coderabbitai · 2026-06-10T23:53:06Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== api/search.py full ==="
cat api/search.py

echo ""
echo "=== search_cli_sessions and list_cli_projects usage ==="
grep -n -A 20 "def search_cli_sessions" services/search.py

echo ""
echo "=== list_cli_projects implementation ==="
grep -rn -A 20 "def list_cli_projects"

echo ""
echo "=== outer try/except in search route ==="
grep -n "try\|except\|@.*route\|def search" api/search.py

Length of output: 4611

🏁 Script executed:

#!/bin/bash
echo "=== search_cli_sessions full body ==="
grep -n -A 80 "def search_cli_sessions" services/search.py

echo ""
echo "=== iter_sessions implementation ==="
grep -n -A 30 "def iter_sessions" utils/cli_chat_reader.py

Length of output: 5085

@clean6378-max-it — you're right, I stand corrected.

iter_sessions (line 370 in utils/cli_chat_reader.py) opens with:

if not os.path.isdir(chats_path):
    return

os.path.isdir(None) returns False in Python without raising, so the generator exits immediately and list_cli_projects(None) returns []. search_cli_sessions then returns an empty list cleanly — no exception, no 500.

My concern about Bug #2 was invalid. All items are confirmed addressed. 🐇

fix: initial implmentation

6e29cf8

bradjin8 self-assigned this Jun 10, 2026

fix: typecheck error

4488963

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

fix: add test for search helpers

d08ca43

bradjin8 requested a review from clean6378-max-it June 10, 2026 18:23

fix; review findings

dd5a01b

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

fix: nitpack and outside diff comments

5ce7702

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread models/parse_warnings.py Outdated

fix: Do not expose raw exception text in API warnings.

dbb094d

clean6378-max-it approved these changes Jun 10, 2026

View reviewed changes

clean6378-max-it requested a review from wpak-ai June 10, 2026 23:54

wpak-ai approved these changes Jun 11, 2026

View reviewed changes

wpak-ai merged commit fe3c728 into master Jun 11, 2026
16 checks passed

wpak-ai deleted the feat/decompose-monolithic-search-handler branch June 11, 2026 15:53

coderabbitai Bot mentioned this pull request Jun 11, 2026

Add dedicated Flask test coverage for search, workspaces, and export APIs (#101) #104

Open

2 tasks

Conversation

bradjin8 commented Jun 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

services/search.py — new module

api/search.py — 522 → 50 lines

SQLite resource management

Duplication eliminated

tests/test_search_helpers.py — 27 new unit tests

tests/test_models_wired_at_read_sites.py

Acceptance criteria

References

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

clean6378-max-it commented Jun 10, 2026

Should fix

Nice to have

Uh oh!

clean6378-max-it commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026

Deep Review – PR #102 (services/search.py + friends)

🔴 Bugs

1. rank_results — epoch-milliseconds vs epoch-seconds mismatch causes cross-source ordering to always be wrong

2. api/search.py — get_cli_chats_path() is not guarded for None

🟠 Design / Correctness Concerns

3. search_legacy_workspaces — datetime.now().isoformat() as a timestamp fallback is misleading

4. search_global_storage — outer exception silently converts any DB-level failure into an empty result set

5. _build_ws_id_to_name — workspace_path parameter is dead (also flagged by you)

🟡 Code Quality / Minor

6. tests/test_search_helpers.py line 25 — unused import (also flagged by you)

7. rank_results test suite — no cross-type (int + ISO string) case

8. .github/workflows/tests.yml — test_search_helpers.py excluded from pytest (also flagged by you)

9. services/search.py — no __all__ to clearly declare the public surface

Summary Table

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clean6378-max-it commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026

Uh oh!

clean6378-max-it commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bradjin8 commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

`services/search.py` — new module

`api/search.py` — 522 → 50 lines

`tests/test_search_helpers.py` — 27 new unit tests

`tests/test_models_wired_at_read_sites.py`

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Deep Review – PR `#102` (`services/search.py` + friends)

1. `rank_results` — epoch-milliseconds vs epoch-seconds mismatch causes cross-source ordering to always be wrong

2. `api/search.py` — `get_cli_chats_path()` is not guarded for `None`

3. `search_legacy_workspaces` — `datetime.now().isoformat()` as a timestamp fallback is misleading

4. `search_global_storage` — outer exception silently converts any DB-level failure into an empty result set

5. `_build_ws_id_to_name` — `workspace_path` parameter is dead (also flagged by you)

6. `tests/test_search_helpers.py` line 25 — unused import (also flagged by you)

7. `rank_results` test suite — no cross-type (int + ISO string) case

8. `.github/workflows/tests.yml` — `test_search_helpers.py` excluded from pytest (also flagged by you)

9. `services/search.py` — no `all` to clearly declare the public surface