refactor(search): decompose 520-line monolithic search handler into focused per-source functions (closes #99)#102
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughExtracts search logic from ChangesSearch Handler Decomposition
Sequence Diagram(s) sequenceDiagram
participant Client
participant search_route as search() route
participant svc_global as services.search.search_global_storage
participant svc_legacy as services.search.search_legacy_workspaces
participant svc_cli as services.search.search_cli_sessions
participant rank as services.search.rank_results
Client->>search_route: GET /api/search?query=Q&type=T
search_route->>svc_global: (workspace_path, query, query_lower, rules, parse_warnings)
svc_global-->>search_route: results
search_route->>svc_legacy: (workspace_path, query, query_lower, search_type, rules)
svc_legacy-->>search_route: results
alt type == "all"
search_route->>svc_cli: (cli_chats_path, query, query_lower, rules)
svc_cli-->>search_route: results
end
search_route->>rank: rank_results(merged_results)
rank-->>search_route: sorted results
search_route->>Client: JSON response with results + parse_warnings
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_search_helpers.py (1)
453-472: ⚡ Quick winConsider adding a test with seeded CLI session data.
The CLI session tests only cover empty/nonexistent directory cases. Unlike
TestSearchGlobalStorageandTestSearchLegacyWorkspaces, there are no tests that seed actual CLI session data (JSONL files + blob stores) and verify matching logic. Adding at least one test that creates a minimal session with a matching query would improve coverage and confidence in the CLI search path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_search_helpers.py` around lines 453 - 472, Add a new test in the TestSearchCliSessions class that seeds a minimal CLI session (create a session JSONL file and corresponding blob store entries) under the tmp_workspace_root["cli_root"], then call search_cli_sessions(...) with a query that should match the seeded session and assert the returned results include the expected session ID/metadata; reference the existing TestSearchCliSessions test methods and the search_cli_sessions function to locate where to add the test and ensure the seeded files mimic the real CLI session JSONL/blob structure used by the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_search_helpers.py`:
- Around line 453-472: Add a new test in the TestSearchCliSessions class that
seeds a minimal CLI session (create a session JSONL file and corresponding blob
store entries) under the tmp_workspace_root["cli_root"], then call
search_cli_sessions(...) with a query that should match the seeded session and
assert the returned results include the expected session ID/metadata; reference
the existing TestSearchCliSessions test methods and the search_cli_sessions
function to locate where to add the test and ensure the seeded files mimic the
real CLI session JSONL/blob structure used by the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5b02a724-2201-4902-8c90-8c5c93d9d723
📒 Files selected for processing (4)
api/search.pyservices/search.pytests/test_models_wired_at_read_sites.pytests/test_search_helpers.py
Should fix.github/workflows/tests.yml:117, tests/test_search_helpers.py — add tests/test_search_helpers.py to the CI pytest invocation (or convert tests to unittest.TestCase so unittest discover picks them up) (new search unit tests are never executed in the CI matrix, undermining the PR's verification goal) tests/test_search_helpers.py:453 — add a positive-match test for search_cli_sessions using mocked list_cli_projects / traverse_blobs / messages_to_bubbles (same pattern as tests/test_cli_tabs.py) (issue #102 acceptance criteria require at least one unit test per extracted function verifying core search logic; only empty-path cases exist today) Nice to haveservices/search.py:122 — remove unused workspace_path parameter from _build_ws_id_to_name (dead parameter adds noise; workspace_entries alone is sufficient) tests/test_search_helpers.py:25 — drop unused import _build_exclusion_searchable (imported but never referenced) |
|
@CodeRabbit. Except my finding, Can you also run deep review yourself? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Read the main new files in this PR
echo "=== services/search.py ==="
cat services/search.py
echo ""
echo "=== api/search.py ==="
cat api/search.py
echo ""
echo "=== tests/test_search_helpers.py ==="
cat tests/test_search_helpers.py
echo ""
echo "=== .github/workflows/tests.yml ==="
cat .github/workflows/tests.yml
echo ""
echo "=== tests/test_models_wired_at_read_sites.py (relevant section) ==="
grep -n "services.search\|api.search\|Bubble\|search" tests/test_models_wired_at_read_sites.py | head -60Length of output: 53857
Deep Review – PR
|
| # | Severity | Location | Issue |
|---|---|---|---|
| 1 | 🔴 Bug | rank_results |
epoch-ms vs epoch-s cross-source ranking is always wrong |
| 2 | 🔴 Bug | api/search.py |
get_cli_chats_path() → None → crash on platforms without CLI |
| 3 | 🟠 Design | search_legacy_workspaces |
datetime.now() fallback stamps missing timestamps as "just now" |
| 4 | 🟠 Design | search_global_storage |
outer exception returns [] with no parse_warnings signal |
| 5 | 🟡 Minor | _build_ws_id_to_name |
unused workspace_path param (also your finding) |
| 6 | 🟡 Minor | test_search_helpers.py:25 |
unused import _build_exclusion_searchable (also your finding) |
| 7 | 🟡 Minor | TestRankResults |
no mixed epoch-ms + ISO string test case (hides bug #1) |
| 8 | 🟡 Minor | tests.yml:117 |
test_search_helpers.py not in pytest invocation (also your finding) |
| 9 | 🟡 Minor | services/search.py |
no __all__ to mark public surface |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
services/search.py (1)
337-339:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftTop-level DB failures are silently swallowed.
When the outer
tryblock fails (e.g., workspace iteration or DB connection), an empty list is returned with no client-visible indication.parse_warningsis available but not used here, so callers cannot distinguish "no matches" from "search failed."Consider recording a search-level warning so the API can surface it:
except Exception: _logger.exception("Error searching global storage") + parse_warnings.record_source_failure("global_storage")This requires adding a
record_source_failuremethod toParseWarningCollector(or a similar mechanism).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/search.py` around lines 337 - 339, The outer exception handler in services/search.py currently swallows DB/workspace failures and only logs via _logger.exception; add a ParseWarningCollector.record_source_failure(exception, source="global_storage") method (or similarly named) to ParseWarningCollector and call it inside that except Exception block with the caught exception and a source identifier (e.g., "global_storage") so callers can see a search-level warning; keep the existing _logger.exception call, populate the warning with the exception message/trace, and ensure the function returns the same result shape but includes the updated ParseWarningCollector so the API can surface the failure.
🧹 Nitpick comments (2)
services/search.py (2)
571-572: 💤 Low valueUse ASCII
xinstead of Unicode×in comment.Static analysis flags the multiplication sign as ambiguous. Use the ASCII letter
xfor clarity.- # .timestamp() → epoch-seconds; ×1000 → epoch-ms to match ints + # .timestamp() → epoch-seconds; x1000 → epoch-ms to match ints🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/search.py` around lines 571 - 572, Update the inline comment next to the datetime conversion to replace the Unicode multiplication sign (×) with the ASCII letter "x" so static analysis isn't confused; locate the comment accompanying the return using datetime.fromisoformat(...).timestamp() * 1000 and change "×1000 → epoch-ms" to "x1000 → epoch-ms" (only the comment text needs modification).Source: Linters/SAST tools
28-33: 💤 Low valueSort
__all__to satisfy linter.Static analysis flags that
__all__is not sorted. Sorting improves readability and avoids churn when adding entries.Suggested fix
__all__ = [ + "rank_results", + "search_cli_sessions", "search_global_storage", "search_legacy_workspaces", - "search_cli_sessions", - "rank_results", ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/search.py` around lines 28 - 33, The __all__ export list is unsorted; reorder the entries in the __all__ list so they are alphabetically sorted by name (e.g. "rank_results", "search_cli_sessions", "search_global_storage", "search_legacy_workspaces") to satisfy the linter and keep exports stable; update the __all__ variable in the module (the current list containing search_global_storage, search_legacy_workspaces, search_cli_sessions, rank_results) to the sorted ordering.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@services/search.py`:
- Around line 337-339: The outer exception handler in services/search.py
currently swallows DB/workspace failures and only logs via _logger.exception;
add a ParseWarningCollector.record_source_failure(exception,
source="global_storage") method (or similarly named) to ParseWarningCollector
and call it inside that except Exception block with the caught exception and a
source identifier (e.g., "global_storage") so callers can see a search-level
warning; keep the existing _logger.exception call, populate the warning with the
exception message/trace, and ensure the function returns the same result shape
but includes the updated ParseWarningCollector so the API can surface the
failure.
---
Nitpick comments:
In `@services/search.py`:
- Around line 571-572: Update the inline comment next to the datetime conversion
to replace the Unicode multiplication sign (×) with the ASCII letter "x" so
static analysis isn't confused; locate the comment accompanying the return using
datetime.fromisoformat(...).timestamp() * 1000 and change "×1000 → epoch-ms" to
"x1000 → epoch-ms" (only the comment text needs modification).
- Around line 28-33: The __all__ export list is unsorted; reorder the entries in
the __all__ list so they are alphabetically sorted by name (e.g. "rank_results",
"search_cli_sessions", "search_global_storage", "search_legacy_workspaces") to
satisfy the linter and keep exports stable; update the __all__ variable in the
module (the current list containing search_global_storage,
search_legacy_workspaces, search_cli_sessions, rank_results) to the sorted
ordering.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b3a09388-2f3b-4e1d-bdf4-bc1b508e9008
📒 Files selected for processing (3)
.github/workflows/tests.ymlservices/search.pytests/test_search_helpers.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_search_helpers.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@models/parse_warnings.py`:
- Line 35: record_source_failure currently stores str(exc) in
self.source_failures and later returns it to clients via to_api_list, which can
leak internals; change record_source_failure (and the similar block around the
to_api_list usage at lines ~79-84) to log the full exception details to the
server log (using the existing logger) but append a sanitized object to
self.source_failures containing the source identifier plus a safe client_message
or error_code (e.g., "source_processing_error") instead of str(exc); update
to_api_list to return that sanitized client_message/error_code and not the raw
exception, while preserving the full exc text only in server logs for debugging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9e5fb3ef-0276-4e48-afb3-be61832f535d
📒 Files selected for processing (2)
models/parse_warnings.pyservices/search.py
🚧 Files skipped from review as they are similar to previous changes (1)
- services/search.py
|
I think @CodeRabbit. my all findings are addressed. also y our feedback are addressed. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== .github/workflows/tests.yml (pytest line) ==="
grep -n "pytest\|test_search" .github/workflows/tests.yml
echo ""
echo "=== services/search.py: __all__, _build_ws_id_to_name, datetime.now, rank_results/_ts, outer except ==="
grep -n "__all__\|_build_ws_id_to_name\|datetime\.now\|def _ts\|def rank_results\|record_source_failure\|except Exception" services/search.py | head -40
echo ""
echo "=== services/search.py: _ts function body ==="
grep -n -A 10 "def _ts" services/search.py
echo ""
echo "=== api/search.py: cli_chats_path guard ==="
grep -n "cli_chats_path\|get_cli_chats_path\|search_cli_sessions" api/search.py
echo ""
echo "=== tests/test_search_helpers.py: imports and cli positive test ==="
head -40 tests/test_search_helpers.py
echo ""
echo "=== tests/test_search_helpers.py: TestSearchCliSessions and TestRankResults ==="
grep -n "def test_\|class Test\|mixed\|epoch_ms\|iso\|ISO\|_build_exclusion" tests/test_search_helpers.pyLength of output: 6460 🏁 Script executed: #!/bin/bash
echo "=== api/search.py full content (cli guard context) ==="
grep -n -B3 -A3 "search_cli_sessions\|get_cli_chats_path" api/search.py
echo ""
echo "=== _build_ws_id_to_name signature ==="
grep -n -A 5 "def _build_ws_id_to_name" services/search.py
echo ""
echo "=== search_legacy_workspaces timestamp fallback ==="
grep -n "lastSendTime\|datetime\.now\|timestamp" services/search.py | grep -A2 -B2 "lastSendTime"
echo ""
echo "=== test_mixed_epoch_ms test body ==="
grep -n -A 20 "def test_mixed_epoch_ms" tests/test_search_helpers.pyLength of output: 3209
The current code: if search_type == "all":
results.extend(
search_cli_sessions(get_cli_chats_path(), query, query_lower, rules)
)On platforms without a Cursor CLI install, cli_path = get_cli_chats_path()
if search_type == "all" and cli_path:
results.extend(search_cli_sessions(cli_path, query, query_lower, rules))Everything else is confirmed fixed:
Just the one-line guard left to add. |
|
line 46 in api/search.py won’t cause a 500 just because CLI chats aren’t installed or the folder doesn’t exist. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== api/search.py full ==="
cat api/search.py
echo ""
echo "=== search_cli_sessions and list_cli_projects usage ==="
grep -n -A 20 "def search_cli_sessions" services/search.py
echo ""
echo "=== list_cli_projects implementation ==="
grep -rn -A 20 "def list_cli_projects"
echo ""
echo "=== outer try/except in search route ==="
grep -n "try\|except\|@.*route\|def search" api/search.pyLength of output: 4611 🏁 Script executed: #!/bin/bash
echo "=== search_cli_sessions full body ==="
grep -n -A 80 "def search_cli_sessions" services/search.py
echo ""
echo "=== iter_sessions implementation ==="
grep -n -A 30 "def iter_sessions" utils/cli_chat_reader.pyLength of output: 5085
if not os.path.isdir(chats_path):
return
My concern about Bug |
Summary
Decomposes the 450-line
search()route handler inapi/search.pyintothree independently testable, single-responsibility functions extracted into
services/search.py. Eliminates the manualconn.close()patterns flaggedin the baseline eval and removes duplicated workspace-mapping logic that was
re-implemented inline.
Changes
services/search.py— new moduleThree public functions, one per data source:
search_global_storage()cursorDiskKVin globalStorage (primary)search_legacy_workspaces()ItemTablechat key (legacy fallback)search_cli_sessions()~/.cursor/chats/rank_results()Private helpers extracted from the inline logic:
_extract_snippet()— context window around a query match (pure)_find_match()— checks title then bubble texts, returns(bool, snippet)(pure)_build_search_bubble_map()—bubbleId:*row loader with schema validation_build_ws_id_to_name()— workspace display-name map using canonicalget_workspace_display_nameapi/search.py— 522 → 50 linesRoute handler now delegates entirely to the service layer:
SQLite resource management
Both
conn = None; try: … finally: conn.close()patterns replaced:open_global_db()context manager fromservices/workspace_db.pywith closing(sqlite3.connect(…)) as conn:Duplication eliminated
Workspace-entry collection, composer-ID-to-workspace mapping, and folder
display-name extraction now call
collect_workspace_entries(),build_composer_id_to_workspace_id(), andget_workspace_display_name()from their canonical locations instead of re-implementing the logic inline.
tests/test_search_helpers.py— 27 new unit testsEach extracted function has its own test class:
TestExtractSnippet(5 cases) — pure snippet logicTestFindMatch(6 cases) — match detection, title-before-bubble priorityTestRankResults(4 cases) — int and ISO 8601 timestamps, empty list, missing keyTestSearchGlobalStorage(5 cases) — hit, miss, required keys, missing DB, display nameTestSearchLegacyWorkspaces(5 cases) — hit, miss, type guard, required keys, no legacy dataTestSearchCliSessions(2 cases) — empty dir, nonexistent pathtests/test_models_wired_at_read_sites.pyUpdated two spy tests to target
services.search.Bubbleand theservices.searchlogger, where those symbols now live after the split.Acceptance criteria
close()callsReferences
services/workspace_db.py(reusedcollect_workspace_entries,build_composer_id_to_workspace_id,open_global_db)Summary by CodeRabbit
Refactor
Bug Fixes / Improvements
Tests