Skip to content

refactor(search): decompose 520-line monolithic search handler into focused per-source functions (closes #99)#102

Merged
wpak-ai merged 6 commits into
masterfrom
feat/decompose-monolithic-search-handler
Jun 11, 2026
Merged

refactor(search): decompose 520-line monolithic search handler into focused per-source functions (closes #99)#102
wpak-ai merged 6 commits into
masterfrom
feat/decompose-monolithic-search-handler

Conversation

@bradjin8

@bradjin8 bradjin8 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Decomposes the 450-line search() route handler in api/search.py into
three independently testable, single-responsibility functions extracted into
services/search.py. Eliminates the manual conn.close() patterns flagged
in the baseline eval and removes duplicated workspace-mapping logic that was
re-implemented inline.

Changes

services/search.py — new module

Three public functions, one per data source:

Function Data source
search_global_storage() cursorDiskKV in globalStorage (primary)
search_legacy_workspaces() per-workspace ItemTable chat key (legacy fallback)
search_cli_sessions() JSONL blob files in ~/.cursor/chats/
rank_results() merges and sorts all results by timestamp

Private helpers extracted from the inline logic:

  • _extract_snippet() — context window around a query match (pure)
  • _find_match() — checks title then bubble texts, returns (bool, snippet) (pure)
  • _build_search_bubble_map()bubbleId:* row loader with schema validation
  • _build_ws_id_to_name() — workspace display-name map using canonical get_workspace_display_name

api/search.py — 522 → 50 lines

Route handler now delegates entirely to the service layer:

results.extend(search_global_storage(...))
results.extend(search_legacy_workspaces(...))
if search_type == "all":
    results.extend(search_cli_sessions(...))
return jsonify(parse_warnings.attach_to({"results": rank_results(results)}))

SQLite resource management

Both conn = None; try: … finally: conn.close() patterns replaced:

  • Global DB: uses existing open_global_db() context manager from services/workspace_db.py
  • Per-workspace legacy DBs: with closing(sqlite3.connect(…)) as conn:

Duplication eliminated

Workspace-entry collection, composer-ID-to-workspace mapping, and folder
display-name extraction now call collect_workspace_entries(),
build_composer_id_to_workspace_id(), and get_workspace_display_name()
from their canonical locations instead of re-implementing the logic inline.

tests/test_search_helpers.py — 27 new unit tests

Each extracted function has its own test class:

  • TestExtractSnippet (5 cases) — pure snippet logic
  • TestFindMatch (6 cases) — match detection, title-before-bubble priority
  • TestRankResults (4 cases) — int and ISO 8601 timestamps, empty list, missing key
  • TestSearchGlobalStorage (5 cases) — hit, miss, required keys, missing DB, display name
  • TestSearchLegacyWorkspaces (5 cases) — hit, miss, type guard, required keys, no legacy data
  • TestSearchCliSessions (2 cases) — empty dir, nonexistent path

tests/test_models_wired_at_read_sites.py

Updated two spy tests to target services.search.Bubble and the
services.search logger, where those symbols now live after the split.

Acceptance criteria

  • Search handler decomposed into 3+ focused functions (one per data source)
  • Each extracted function independently testable with clear input/output
  • SQLite connections use context managers — no manual close() calls
  • Search functionality unchanged (same API contract, same results)
  • At least one unit test per extracted function
  • 443 tests passing, 0 failing

References

Summary by CodeRabbit

  • Refactor

    • Search backend reorganized into dedicated readers for global storage, legacy workspaces, and CLI sessions with centralized result merging and ranking.
  • Bug Fixes / Improvements

    • More consistent exclusion/matching and snippet extraction, improved timestamp normalization and result ordering, and richer parse/source warnings surfaced in responses.
  • Tests

    • Expanded unit and integration tests for multi-source search, matching/snippets, and ranking; CI updated to run the new tests.

@bradjin8 bradjin8 self-assigned this Jun 10, 2026
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Extracts search logic from api/search.py into services/search.py (three readers: global storage, legacy workspaces, CLI sessions), adds helpers and rank_results, simplifies the route to orchestrate these services, extends parse-warnings handling, and adds/updates tests and CI selection.

Changes

Search Handler Decomposition

Layer / File(s) Summary
Service layer helpers and search implementations
services/search.py
Implements search_global_storage, search_legacy_workspaces, search_cli_sessions, helper functions for exclusion/searchable text, _extract_snippet/_find_match, workspace display mapping, bubble map building, and rank_results for timestamp-descending ordering.
API route handler refactoring
api/search.py
Route handler now initializes ParseWarningCollector, computes query_lower, delegates to the three service readers (CLI reader called only when search_type == "all"), aggregates results, and returns jsonify(parse_warnings.attach_to({"results": rank_results(results)})). Imports were reduced accordingly.
ParseWarningCollector: source failure tracking
models/parse_warnings.py
Adds source_failures list, record_source_failure(exc, source) method, includes source_failure items in to_api_list(), and updates dataclass field initialization.
Test coverage for search layer
tests/test_search_helpers.py, tests/test_models_wired_at_read_sites.py
Adds unit tests for _extract_snippet, _find_match, and rank_results, integration-style tests that seed SQLite global/workspace/CLI DBs and validate each search reader's output shape/behavior, and updates an existing test to patch/log from services.search instead of api/search.
CI test selection
.github/workflows/tests.yml
Adds tests/test_search_helpers.py to the pytest job so the new tests run in CI.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant search_route as search() route
  participant svc_global as services.search.search_global_storage
  participant svc_legacy as services.search.search_legacy_workspaces
  participant svc_cli as services.search.search_cli_sessions
  participant rank as services.search.rank_results

  Client->>search_route: GET /api/search?query=Q&type=T
  search_route->>svc_global: (workspace_path, query, query_lower, rules, parse_warnings)
  svc_global-->>search_route: results
  search_route->>svc_legacy: (workspace_path, query, query_lower, search_type, rules)
  svc_legacy-->>search_route: results
  alt type == "all"
    search_route->>svc_cli: (cli_chats_path, query, query_lower, rules)
    svc_cli-->>search_route: results
  end
  search_route->>rank: rank_results(merged_results)
  rank-->>search_route: sorted results
  search_route->>Client: JSON response with results + parse_warnings
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • clean6378-max-it
  • wpak-ai
  • timon0305

Poem

🐰 I hopped through code, split one big task in two,

Global, legacy, CLI — each finds what is true.
Snippets and ranks, warnings kept in view,
Tests planted seeds, CI now runs them through.
A tidy burrow, and a carrot for you.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR addresses core objectives from #99 but has unresolved issues: timestamp normalization bug causes incorrect cross-source ordering, missing CLI path guard risks 500 errors, and new tests not executed by CI. Fix timestamp normalization to use epoch-ms consistently, add get_cli_chats_path() None check, include tests/test_search_helpers.py in .github/workflows/tests.yml, and add mixed-type timestamp test.
Docstring Coverage ⚠️ Warning Docstring coverage is 37.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main refactoring: decomposing a monolithic 520-line search handler into focused per-source functions, which aligns with the primary change in the changeset.
Out of Scope Changes check ✅ Passed All changes directly support the decomposition objective: new services/search.py module with extracted functions, refactored api/search.py route handler, updated tests, and CI workflow modifications are in-scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/decompose-monolithic-search-handler

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_search_helpers.py (1)

453-472: ⚡ Quick win

Consider adding a test with seeded CLI session data.

The CLI session tests only cover empty/nonexistent directory cases. Unlike TestSearchGlobalStorage and TestSearchLegacyWorkspaces, there are no tests that seed actual CLI session data (JSONL files + blob stores) and verify matching logic. Adding at least one test that creates a minimal session with a matching query would improve coverage and confidence in the CLI search path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_helpers.py` around lines 453 - 472, Add a new test in the
TestSearchCliSessions class that seeds a minimal CLI session (create a session
JSONL file and corresponding blob store entries) under the
tmp_workspace_root["cli_root"], then call search_cli_sessions(...) with a query
that should match the seeded session and assert the returned results include the
expected session ID/metadata; reference the existing TestSearchCliSessions test
methods and the search_cli_sessions function to locate where to add the test and
ensure the seeded files mimic the real CLI session JSONL/blob structure used by
the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_search_helpers.py`:
- Around line 453-472: Add a new test in the TestSearchCliSessions class that
seeds a minimal CLI session (create a session JSONL file and corresponding blob
store entries) under the tmp_workspace_root["cli_root"], then call
search_cli_sessions(...) with a query that should match the seeded session and
assert the returned results include the expected session ID/metadata; reference
the existing TestSearchCliSessions test methods and the search_cli_sessions
function to locate where to add the test and ensure the seeded files mimic the
real CLI session JSONL/blob structure used by the code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5b02a724-2201-4902-8c90-8c5c93d9d723

📥 Commits

Reviewing files that changed from the base of the PR and between 937dc91 and 6e29cf8.

📒 Files selected for processing (4)
  • api/search.py
  • services/search.py
  • tests/test_models_wired_at_read_sites.py
  • tests/test_search_helpers.py

@clean6378-max-it

Copy link
Copy Markdown
Collaborator

Should fix

.github/workflows/tests.yml:117, tests/test_search_helpers.py — add tests/test_search_helpers.py to the CI pytest invocation (or convert tests to unittest.TestCase so unittest discover picks them up) (new search unit tests are never executed in the CI matrix, undermining the PR's verification goal)

tests/test_search_helpers.py:453 — add a positive-match test for search_cli_sessions using mocked list_cli_projects / traverse_blobs / messages_to_bubbles (same pattern as tests/test_cli_tabs.py) (issue #102 acceptance criteria require at least one unit test per extracted function verifying core search logic; only empty-path cases exist today)

Nice to have

services/search.py:122 — remove unused workspace_path parameter from _build_ws_id_to_name (dead parameter adds noise; workspace_entries alone is sufficient)

tests/test_search_helpers.py:25 — drop unused import _build_exclusion_searchable (imported but never referenced)

@clean6378-max-it

Copy link
Copy Markdown
Collaborator

@CodeRabbit. Except my finding, Can you also run deep review yourself?

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the main new files in this PR
echo "=== services/search.py ==="
cat services/search.py

echo ""
echo "=== api/search.py ==="
cat api/search.py

echo ""
echo "=== tests/test_search_helpers.py ==="
cat tests/test_search_helpers.py

echo ""
echo "=== .github/workflows/tests.yml ==="
cat .github/workflows/tests.yml

echo ""
echo "=== tests/test_models_wired_at_read_sites.py (relevant section) ==="
grep -n "services.search\|api.search\|Bubble\|search" tests/test_models_wired_at_read_sites.py | head -60

Length of output: 53857



@clean6378-max-it — sure, here is my own deep-dive review on top of your findings.


Deep Review – PR #102 (services/search.py + friends)

🔴 Bugs

1. rank_results — epoch-milliseconds vs epoch-seconds mismatch causes cross-source ordering to always be wrong

search_global_storage and search_cli_sessions both produce integer epoch-ms timestamps (1.715 × 10¹²). search_legacy_workspaces produces an ISO string ("2024-01-01T00:00:00Z"). The _ts helper converts ISO strings via .timestamp() to epoch-seconds (1.715 × 10⁹), but leaves integers untouched. Because 1.715e12 ≫ 1.715e9, every single composer/CLI result will always sort above every single legacy-chat result, regardless of actual recency.

# services/search.py – rank_results helper
def _ts(r: dict) -> float:
    t = r.get("timestamp", 0)
    if isinstance(t, str):
        try:
            return datetime.fromisoformat(t.replace("Z", "+00:00")).timestamp()
            #                                                         ^^^^^^^^^^
            #  epoch-seconds ≈ 1.7e9 — but composer ints are epoch-ms ≈ 1.7e12
        except Exception:
            return 0.0
    return float(t) if t else 0.0   # epoch-ms int passed through unchanged

Fix — normalise everything to epoch-seconds (or epoch-ms) before comparing:

def _ts(r: dict) -> float:
    t = r.get("timestamp", 0)
    if isinstance(t, str):
        try:
            return datetime.fromisoformat(t.replace("Z", "+00:00")).timestamp() * 1000
        except Exception:
            return 0.0
    return float(t) if t else 0.0   # already epoch-ms

The rank_results test only uses homogeneous-type inputs so this discrepancy is invisible in the test suite — a cross-type case should be added.


2. api/search.pyget_cli_chats_path() is not guarded for None

# api/search.py
if search_type == "all":
    results.extend(
        search_cli_sessions(get_cli_chats_path(), query, query_lower, rules)
        #                   ^^^^^^^^^^^^^^^^^^^^ may return None on platforms
        #                   without a CLI install; list_cli_projects(None) raises
    )

list_cli_projects will receive None and likely raise TypeError or FileNotFoundError, which is caught by the outer except Exception block and returns a 500. Adding a guard:

cli_path = get_cli_chats_path()
if search_type == "all" and cli_path:
    results.extend(search_cli_sessions(cli_path, query, query_lower, rules))

🟠 Design / Correctness Concerns

3. search_legacy_workspacesdatetime.now().isoformat() as a timestamp fallback is misleading

"timestamp": tab.get("lastSendTime") or datetime.now().isoformat(),

If lastSendTime is absent the conversation is stamped now, making it appear as the most-recently-active chat on every call. A None or 0 fallback (pinning it to the bottom of results) is more honest. Also datetime.now() returns a naïve datetime; datetime.now(tz=timezone.utc).isoformat() is safer for cross-platform consistency.


4. search_global_storage — outer exception silently converts any DB-level failure into an empty result set

except Exception:
    _logger.exception("Error searching global storage")
    # returns []  — caller receives no signal; parse_warnings is untouched

A corrupt or missing global DB returns [] with no API-level indication (no parse_warnings entry, no response field). Consider recording a warning so the client can distinguish "no results" from "search failed":

except Exception:
    _logger.exception("Error searching global storage")
    parse_warnings.record_source_failure("global_storage")   # or equivalent

5. _build_ws_id_to_nameworkspace_path parameter is dead (also flagged by you)

Confirmed: the parameter is declared but the body only reads entry["workspaceJsonPath"], which already contains the absolute path. The parameter can be removed and all call sites updated.


🟡 Code Quality / Minor

6. tests/test_search_helpers.py line 25 — unused import (also flagged by you)

from services.search import (
    _build_exclusion_searchable,   # ← never referenced in any test
    ...
)

7. rank_results test suite — no cross-type (int + ISO string) case

All TestRankResults tests use homogeneous timestamp types. The epoch-ms/epoch-s bug above would pass every existing test. A mixed-type test should be added:

def test_mixed_epoch_ms_and_iso_string_sort_by_recency(self):
    # composer result from 2024-05 (epoch-ms) vs chat result from 2025-01 (ISO)
    results = [
        {"timestamp": 1_715_000_000_000, "type": "composer"},  # 2024-05
        {"timestamp": "2025-01-01T00:00:00Z",  "type": "chat"},   # 2025-01
    ]
    ranked = rank_results(results)
    assert ranked[0]["type"] == "chat"   # 2025 is more recent

8. .github/workflows/tests.ymltest_search_helpers.py excluded from pytest (also flagged by you)

The pytest step only lists two files:

run: python -m pytest tests/test_api_endpoints.py tests/test_pdf_export.py -v --tb=short

tests/test_search_helpers.py uses pure-pytest classes (no unittest.TestCase), so python -m unittest discover won't collect it either. The 27 new unit tests are never executed in CI. Add the file to the pytest invocation:

run: python -m pytest tests/test_api_endpoints.py tests/test_pdf_export.py tests/test_search_helpers.py -v --tb=short

9. services/search.py — no __all__ to clearly declare the public surface

With four public functions and four private helpers all at module level, an explicit __all__ would clarify the contract and prevent accidental re-export:

__all__ = [
    "search_global_storage",
    "search_legacy_workspaces",
    "search_cli_sessions",
    "rank_results",
]

Summary Table

# Severity Location Issue
1 🔴 Bug rank_results epoch-ms vs epoch-s cross-source ranking is always wrong
2 🔴 Bug api/search.py get_cli_chats_path()None → crash on platforms without CLI
3 🟠 Design search_legacy_workspaces datetime.now() fallback stamps missing timestamps as "just now"
4 🟠 Design search_global_storage outer exception returns [] with no parse_warnings signal
5 🟡 Minor _build_ws_id_to_name unused workspace_path param (also your finding)
6 🟡 Minor test_search_helpers.py:25 unused import _build_exclusion_searchable (also your finding)
7 🟡 Minor TestRankResults no mixed epoch-ms + ISO string test case (hides bug #1)
8 🟡 Minor tests.yml:117 test_search_helpers.py not in pytest invocation (also your finding)
9 🟡 Minor services/search.py no __all__ to mark public surface

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
services/search.py (1)

337-339: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Top-level DB failures are silently swallowed.

When the outer try block fails (e.g., workspace iteration or DB connection), an empty list is returned with no client-visible indication. parse_warnings is available but not used here, so callers cannot distinguish "no matches" from "search failed."

Consider recording a search-level warning so the API can surface it:

     except Exception:
         _logger.exception("Error searching global storage")
+        parse_warnings.record_source_failure("global_storage")

This requires adding a record_source_failure method to ParseWarningCollector (or a similar mechanism).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/search.py` around lines 337 - 339, The outer exception handler in
services/search.py currently swallows DB/workspace failures and only logs via
_logger.exception; add a ParseWarningCollector.record_source_failure(exception,
source="global_storage") method (or similarly named) to ParseWarningCollector
and call it inside that except Exception block with the caught exception and a
source identifier (e.g., "global_storage") so callers can see a search-level
warning; keep the existing _logger.exception call, populate the warning with the
exception message/trace, and ensure the function returns the same result shape
but includes the updated ParseWarningCollector so the API can surface the
failure.
🧹 Nitpick comments (2)
services/search.py (2)

571-572: 💤 Low value

Use ASCII x instead of Unicode × in comment.

Static analysis flags the multiplication sign as ambiguous. Use the ASCII letter x for clarity.

-                # .timestamp() → epoch-seconds; ×1000 → epoch-ms to match ints
+                # .timestamp() → epoch-seconds; x1000 → epoch-ms to match ints
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/search.py` around lines 571 - 572, Update the inline comment next to
the datetime conversion to replace the Unicode multiplication sign (×) with the
ASCII letter "x" so static analysis isn't confused; locate the comment
accompanying the return using datetime.fromisoformat(...).timestamp() * 1000 and
change "×1000 → epoch-ms" to "x1000 → epoch-ms" (only the comment text needs
modification).

Source: Linters/SAST tools


28-33: 💤 Low value

Sort __all__ to satisfy linter.

Static analysis flags that __all__ is not sorted. Sorting improves readability and avoids churn when adding entries.

Suggested fix
 __all__ = [
+    "rank_results",
+    "search_cli_sessions",
     "search_global_storage",
     "search_legacy_workspaces",
-    "search_cli_sessions",
-    "rank_results",
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/search.py` around lines 28 - 33, The __all__ export list is
unsorted; reorder the entries in the __all__ list so they are alphabetically
sorted by name (e.g. "rank_results", "search_cli_sessions",
"search_global_storage", "search_legacy_workspaces") to satisfy the linter and
keep exports stable; update the __all__ variable in the module (the current list
containing search_global_storage, search_legacy_workspaces, search_cli_sessions,
rank_results) to the sorted ordering.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@services/search.py`:
- Around line 337-339: The outer exception handler in services/search.py
currently swallows DB/workspace failures and only logs via _logger.exception;
add a ParseWarningCollector.record_source_failure(exception,
source="global_storage") method (or similarly named) to ParseWarningCollector
and call it inside that except Exception block with the caught exception and a
source identifier (e.g., "global_storage") so callers can see a search-level
warning; keep the existing _logger.exception call, populate the warning with the
exception message/trace, and ensure the function returns the same result shape
but includes the updated ParseWarningCollector so the API can surface the
failure.

---

Nitpick comments:
In `@services/search.py`:
- Around line 571-572: Update the inline comment next to the datetime conversion
to replace the Unicode multiplication sign (×) with the ASCII letter "x" so
static analysis isn't confused; locate the comment accompanying the return using
datetime.fromisoformat(...).timestamp() * 1000 and change "×1000 → epoch-ms" to
"x1000 → epoch-ms" (only the comment text needs modification).
- Around line 28-33: The __all__ export list is unsorted; reorder the entries in
the __all__ list so they are alphabetically sorted by name (e.g. "rank_results",
"search_cli_sessions", "search_global_storage", "search_legacy_workspaces") to
satisfy the linter and keep exports stable; update the __all__ variable in the
module (the current list containing search_global_storage,
search_legacy_workspaces, search_cli_sessions, rank_results) to the sorted
ordering.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3a09388-2f3b-4e1d-bdf4-bc1b508e9008

📥 Commits

Reviewing files that changed from the base of the PR and between d08ca43 and dd5a01b.

📒 Files selected for processing (3)
  • .github/workflows/tests.yml
  • services/search.py
  • tests/test_search_helpers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_search_helpers.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/parse_warnings.py`:
- Line 35: record_source_failure currently stores str(exc) in
self.source_failures and later returns it to clients via to_api_list, which can
leak internals; change record_source_failure (and the similar block around the
to_api_list usage at lines ~79-84) to log the full exception details to the
server log (using the existing logger) but append a sanitized object to
self.source_failures containing the source identifier plus a safe client_message
or error_code (e.g., "source_processing_error") instead of str(exc); update
to_api_list to return that sanitized client_message/error_code and not the raw
exception, while preserving the full exc text only in server logs for debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e5fb3ef-0276-4e48-afb3-be61832f535d

📥 Commits

Reviewing files that changed from the base of the PR and between dd5a01b and 5ce7702.

📒 Files selected for processing (2)
  • models/parse_warnings.py
  • services/search.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/search.py

Comment thread models/parse_warnings.py Outdated
@clean6378-max-it

Copy link
Copy Markdown
Collaborator

I think @CodeRabbit. my all findings are addressed. also y our feedback are addressed.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== .github/workflows/tests.yml (pytest line) ==="
grep -n "pytest\|test_search" .github/workflows/tests.yml

echo ""
echo "=== services/search.py: __all__, _build_ws_id_to_name, datetime.now, rank_results/_ts, outer except ==="
grep -n "__all__\|_build_ws_id_to_name\|datetime\.now\|def _ts\|def rank_results\|record_source_failure\|except Exception" services/search.py | head -40

echo ""
echo "=== services/search.py: _ts function body ==="
grep -n -A 10 "def _ts" services/search.py

echo ""
echo "=== api/search.py: cli_chats_path guard ==="
grep -n "cli_chats_path\|get_cli_chats_path\|search_cli_sessions" api/search.py

echo ""
echo "=== tests/test_search_helpers.py: imports and cli positive test ==="
head -40 tests/test_search_helpers.py

echo ""
echo "=== tests/test_search_helpers.py: TestSearchCliSessions and TestRankResults ==="
grep -n "def test_\|class Test\|mixed\|epoch_ms\|iso\|ISO\|_build_exclusion" tests/test_search_helpers.py

Length of output: 6460


🏁 Script executed:

#!/bin/bash
echo "=== api/search.py full content (cli guard context) ==="
grep -n -B3 -A3 "search_cli_sessions\|get_cli_chats_path" api/search.py

echo ""
echo "=== _build_ws_id_to_name signature ==="
grep -n -A 5 "def _build_ws_id_to_name" services/search.py

echo ""
echo "=== search_legacy_workspaces timestamp fallback ==="
grep -n "lastSendTime\|datetime\.now\|timestamp" services/search.py | grep -A2 -B2 "lastSendTime"

echo ""
echo "=== test_mixed_epoch_ms test body ==="
grep -n -A 20 "def test_mixed_epoch_ms" tests/test_search_helpers.py

Length of output: 3209



@clean6378-max-it — nearly everything is addressed, great work! One item from Bug #2 remains open:

api/search.py Line 46 — get_cli_chats_path() is still not guarded for None

The current code:

if search_type == "all":
    results.extend(
        search_cli_sessions(get_cli_chats_path(), query, query_lower, rules)
    )

On platforms without a Cursor CLI install, get_cli_chats_path() returns None, which gets passed into search_cli_sessionslist_cli_projects(None) → crash → 500 response. The fix is one line:

cli_path = get_cli_chats_path()
if search_type == "all" and cli_path:
    results.extend(search_cli_sessions(cli_path, query, query_lower, rules))

Everything else is confirmed fixed:

# Item Status
1 CI pytest includes test_search_helpers.py
2 get_cli_chats_path() None guard ❌ still open
3 _build_ws_id_to_name dead workspace_path param removed
4 rank_results epoch-ms normalisation (× 1000)
5 search_global_storage outer exception calls record_source_failure
6 Unused _build_exclusion_searchable import removed
7 Mixed epoch-ms + ISO string TestRankResults test
8 Positive-match TestSearchCliSessions test
9 __all__ in services/search.py
10 search_legacy_workspaces fallback timestamp changed to 0

Just the one-line guard left to add.

@clean6378-max-it

Copy link
Copy Markdown
Collaborator

line 46 in api/search.py won’t cause a 500 just because CLI chats aren’t installed or the folder doesn’t exist.
@CodeRabbit

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== api/search.py full ==="
cat api/search.py

echo ""
echo "=== search_cli_sessions and list_cli_projects usage ==="
grep -n -A 20 "def search_cli_sessions" services/search.py

echo ""
echo "=== list_cli_projects implementation ==="
grep -rn -A 20 "def list_cli_projects"

echo ""
echo "=== outer try/except in search route ==="
grep -n "try\|except\|@.*route\|def search" api/search.py

Length of output: 4611


🏁 Script executed:

#!/bin/bash
echo "=== search_cli_sessions full body ==="
grep -n -A 80 "def search_cli_sessions" services/search.py

echo ""
echo "=== iter_sessions implementation ==="
grep -n -A 30 "def iter_sessions" utils/cli_chat_reader.py

Length of output: 5085



@clean6378-max-it — you're right, I stand corrected.

iter_sessions (line 370 in utils/cli_chat_reader.py) opens with:

if not os.path.isdir(chats_path):
    return

os.path.isdir(None) returns False in Python without raising, so the generator exits immediately and list_cli_projects(None) returns []. search_cli_sessions then returns an empty list cleanly — no exception, no 500.

My concern about Bug #2 was invalid. All items are confirmed addressed. 🐇

@clean6378-max-it clean6378-max-it requested a review from wpak-ai June 10, 2026 23:54
@wpak-ai wpak-ai merged commit fe3c728 into master Jun 11, 2026
16 checks passed
@wpak-ai wpak-ai deleted the feat/decompose-monolithic-search-handler branch June 11, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decompose 520-line monolithic search handler

3 participants