deploy: update prod to v1.7.3 by rdimitrov · Pull Request #1223 · modelcontextprotocol/registry

rdimitrov · 2026-04-28T21:49:28Z

Summary

Promotes v1.7.3 to production. Contents:

fix(list): use JSONB containment for RemoteURL filter (GIN-indexable) #1220 — RemoteURL filter SQL rewritten to use JSONB containment (@>) instead of EXISTS jsonb_array_elements. The new form is GIN-indexable; the old form forced a full table scan. Local benchmark: 40,398 → 294 buffer reads, 17 ms → 1.6 ms. Maps to prod's 10,005 ms cold-cache observation.
fix(db): raise pgxpool MaxConns + PG max_connections + set PG resources #1221 — pgxpool MaxConns 30→60, MinConns 5→10. PG max_connections 100→200. Explicit PG resources: block (was unset).

What this addresses

fix(list): use JSONB containment for RemoteURL filter (GIN-indexable) #1220: yesterday's 17:08 UTC Publish Endpoint Latency alert (dev.storage/mcp publish took 14.8s with remotes_ms=10980 — pinpointed by the per-phase slog from fix(list): use row-constructor cursor + restore per-phase publish timings + enable pg_stat_statements #1215).
fix(db): raise pgxpool MaxConns + PG max_connections + set PG resources #1221: yesterday's 17:35–17:40 UTC Availability dropped below 95% alert and 18:17 UTC Publish Endpoint Latency re-fire. Both were scraper-driven concurrency on /v0/servers (~15 req/s sustained from ServiceNow + others). With the bumped pool, the queue at the Go HTTP layer should clear faster instead of blowing up to 20–35s nginx-level latencies.

Deployment caveat

The PG max_connections change is a postmaster-level setting → CNPG triggers a PG restart on the next prod Pulumi run. With instances: 1 this is brief downtime — staging took ~30s during the equivalent restart, with one registry pod bouncing once on its 8-attempt DB-retry budget before recovering on the next kubelet restart.

Time the merge for a low-traffic UTC window. Alert history suggests very early UTC (02:00–04:00) is quietest.

Resource impact

PG node memory is currently 39% (~2.4 GiB / 6 GiB allocatable). Worst-case PG memory growth with max_connections=200 lands around 3–4 GiB, putting the node at ~65% — fits with headroom. Empirically, prod PG has peaked at 413 MiB in the last 30h of incident data, so the proposed 4 GiB limit is ~10× the historical max — guardrail not constraint.

Post-merge

CNPG handles pg_stat_statements extension creation automatically (no manual CREATE EXTENSION step needed — it was already done in v1.7.2's deploy).

Verify after deploy:

PATH=/opt/homebrew/share/google-cloud-sdk/bin:$PATH

# max_connections actually changed
kubectl exec -i registry-pg-1 -c postgres \
  --context gke_mcp-registry-prod_us-central1-b_mcp-registry-prod \
  -- psql -U postgres -tAc "SHOW max_connections"  # expect: 200

# resources block applied
kubectl get pod registry-pg-1 \
  --context gke_mcp-registry-prod_us-central1-b_mcp-registry-prod \
  -o jsonpath='{.spec.containers[?(@.name=="postgres")].resources}{"\n"}'

# pgxpool MaxConns reflected (registry app uses 60 per pod after restart)
kubectl exec -i registry-pg-1 -c postgres \
  --context gke_mcp-registry-prod_us-central1-b_mcp-registry-prod \
  -- psql -U postgres -tAc "SELECT count(*) FROM pg_stat_activity WHERE datname='app'"

Test plan

v1.7.3 release built and pushed (ghcr.io/modelcontextprotocol/registry:1.7.3)
Staging deployed cleanly; PG restarted and came back with max_connections=200; one staging pod bounced as expected
Prod Pulumi run applies cleanly; brief PG restart
Confirm SHOW max_connections returns 200 on prod
Confirm publish complete events show remotes_ms < 10ms
Watch for any "too many connections" errors during the rollout window (none expected — Pulumi orders CNPG cluster before Deployment, so PG accepts the new conn limit before pgxpool tries to use it)

🤖 Generated with Claude Code

Promotes #1220 + #1221 to production: - #1220: RemoteURL filter rewritten to use JSONB containment, GIN-indexable. Resolves the 10s validateNoDuplicateRemoteURLs path that caused yesterday's 17:08 UTC publish-latency alert. - #1221: pgxpool MaxConns 30→60, MinConns 5→10. PG max_connections 100→200. Explicit PG resources block (was unset). Addresses the scraper-driven concurrency that caused yesterday's 17:35-17:40 UTC availability alert and 18:17 UTC re-fire. Caveat: the CNPG spec change in v1.7.3 triggers a PG postmaster restart on the next prod Pulumi run (max_connections is a postmaster-level setting). Staging took ~30s of registry unavailability during the same restart pattern, with one pod bouncing once on its 8-attempt DB-retry budget before recovering. Same will happen on prod. Time the merge for a low-traffic UTC window. Recommended: very early UTC (02:00-04:00) per the alert history showing minimal scraper traffic in that window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rdimitrov merged commit 0a97e59 into main Apr 28, 2026
5 checks passed

rdimitrov deleted the promote-prod-v1.7.3 branch April 28, 2026 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deploy: update prod to v1.7.3#1223

deploy: update prod to v1.7.3#1223
rdimitrov merged 1 commit intomainfrom
promote-prod-v1.7.3

rdimitrov commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdimitrov commented Apr 28, 2026

Summary

What this addresses

Deployment caveat

Resource impact

Post-merge

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant