Conversation
Promotes #1220 + #1221 to production: - #1220: RemoteURL filter rewritten to use JSONB containment, GIN-indexable. Resolves the 10s validateNoDuplicateRemoteURLs path that caused yesterday's 17:08 UTC publish-latency alert. - #1221: pgxpool MaxConns 30→60, MinConns 5→10. PG max_connections 100→200. Explicit PG resources block (was unset). Addresses the scraper-driven concurrency that caused yesterday's 17:35-17:40 UTC availability alert and 18:17 UTC re-fire. Caveat: the CNPG spec change in v1.7.3 triggers a PG postmaster restart on the next prod Pulumi run (max_connections is a postmaster-level setting). Staging took ~30s of registry unavailability during the same restart pattern, with one pod bouncing once on its 8-attempt DB-retry budget before recovering. Same will happen on prod. Time the merge for a low-traffic UTC window. Recommended: very early UTC (02:00-04:00) per the alert history showing minimal scraper traffic in that window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes v1.7.3 to production. Contents:
RemoteURLfilter SQL rewritten to use JSONB containment (@>) instead ofEXISTS jsonb_array_elements. The new form is GIN-indexable; the old form forced a full table scan. Local benchmark: 40,398 → 294 buffer reads, 17 ms → 1.6 ms. Maps to prod's 10,005 ms cold-cache observation.MaxConns 30→60,MinConns 5→10. PGmax_connections 100→200. Explicit PGresources:block (was unset).What this addresses
Publish Endpoint Latencyalert (dev.storage/mcppublish took 14.8s withremotes_ms=10980— pinpointed by the per-phase slog from fix(list): use row-constructor cursor + restore per-phase publish timings + enable pg_stat_statements #1215).Availability dropped below 95%alert and 18:17 UTCPublish Endpoint Latencyre-fire. Both were scraper-driven concurrency on/v0/servers(~15 req/s sustained from ServiceNow + others). With the bumped pool, the queue at the Go HTTP layer should clear faster instead of blowing up to 20–35s nginx-level latencies.Deployment caveat
The PG
max_connectionschange is a postmaster-level setting → CNPG triggers a PG restart on the next prod Pulumi run. Withinstances: 1this is brief downtime — staging took ~30s during the equivalent restart, with one registry pod bouncing once on its 8-attempt DB-retry budget before recovering on the next kubelet restart.Time the merge for a low-traffic UTC window. Alert history suggests very early UTC (02:00–04:00) is quietest.
Resource impact
PG node memory is currently 39% (~2.4 GiB / 6 GiB allocatable). Worst-case PG memory growth with
max_connections=200lands around 3–4 GiB, putting the node at ~65% — fits with headroom. Empirically, prod PG has peaked at 413 MiB in the last 30h of incident data, so the proposed 4 GiB limit is ~10× the historical max — guardrail not constraint.Post-merge
CNPG handles
pg_stat_statementsextension creation automatically (no manualCREATE EXTENSIONstep needed — it was already done in v1.7.2's deploy).Verify after deploy:
Test plan
ghcr.io/modelcontextprotocol/registry:1.7.3)max_connections=200; one staging pod bounced as expectedSHOW max_connectionsreturns 200 on prodpublish completeevents showremotes_ms< 10ms🤖 Generated with Claude Code