Skip to content

deploy: update prod to v1.7.3#1223

Merged
rdimitrov merged 1 commit intomainfrom
promote-prod-v1.7.3
Apr 28, 2026
Merged

deploy: update prod to v1.7.3#1223
rdimitrov merged 1 commit intomainfrom
promote-prod-v1.7.3

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

Summary

Promotes v1.7.3 to production. Contents:

What this addresses

Deployment caveat

The PG max_connections change is a postmaster-level setting → CNPG triggers a PG restart on the next prod Pulumi run. With instances: 1 this is brief downtime — staging took ~30s during the equivalent restart, with one registry pod bouncing once on its 8-attempt DB-retry budget before recovering on the next kubelet restart.

Time the merge for a low-traffic UTC window. Alert history suggests very early UTC (02:00–04:00) is quietest.

Resource impact

PG node memory is currently 39% (~2.4 GiB / 6 GiB allocatable). Worst-case PG memory growth with max_connections=200 lands around 3–4 GiB, putting the node at ~65% — fits with headroom. Empirically, prod PG has peaked at 413 MiB in the last 30h of incident data, so the proposed 4 GiB limit is ~10× the historical max — guardrail not constraint.

Post-merge

CNPG handles pg_stat_statements extension creation automatically (no manual CREATE EXTENSION step needed — it was already done in v1.7.2's deploy).

Verify after deploy:

PATH=/opt/homebrew/share/google-cloud-sdk/bin:$PATH

# max_connections actually changed
kubectl exec -i registry-pg-1 -c postgres \
  --context gke_mcp-registry-prod_us-central1-b_mcp-registry-prod \
  -- psql -U postgres -tAc "SHOW max_connections"  # expect: 200

# resources block applied
kubectl get pod registry-pg-1 \
  --context gke_mcp-registry-prod_us-central1-b_mcp-registry-prod \
  -o jsonpath='{.spec.containers[?(@.name=="postgres")].resources}{"\n"}'

# pgxpool MaxConns reflected (registry app uses 60 per pod after restart)
kubectl exec -i registry-pg-1 -c postgres \
  --context gke_mcp-registry-prod_us-central1-b_mcp-registry-prod \
  -- psql -U postgres -tAc "SELECT count(*) FROM pg_stat_activity WHERE datname='app'"

Test plan

  • v1.7.3 release built and pushed (ghcr.io/modelcontextprotocol/registry:1.7.3)
  • Staging deployed cleanly; PG restarted and came back with max_connections=200; one staging pod bounced as expected
  • Prod Pulumi run applies cleanly; brief PG restart
  • Confirm SHOW max_connections returns 200 on prod
  • Confirm publish complete events show remotes_ms < 10ms
  • Watch for any "too many connections" errors during the rollout window (none expected — Pulumi orders CNPG cluster before Deployment, so PG accepts the new conn limit before pgxpool tries to use it)

🤖 Generated with Claude Code

Promotes #1220 + #1221 to production:

- #1220: RemoteURL filter rewritten to use JSONB containment, GIN-indexable.
  Resolves the 10s validateNoDuplicateRemoteURLs path that caused
  yesterday's 17:08 UTC publish-latency alert.
- #1221: pgxpool MaxConns 30→60, MinConns 5→10. PG max_connections
  100→200. Explicit PG resources block (was unset). Addresses the
  scraper-driven concurrency that caused yesterday's 17:35-17:40 UTC
  availability alert and 18:17 UTC re-fire.

Caveat: the CNPG spec change in v1.7.3 triggers a PG postmaster restart
on the next prod Pulumi run (max_connections is a postmaster-level
setting). Staging took ~30s of registry unavailability during the same
restart pattern, with one pod bouncing once on its 8-attempt DB-retry
budget before recovering. Same will happen on prod.

Time the merge for a low-traffic UTC window. Recommended: very early
UTC (02:00-04:00) per the alert history showing minimal scraper traffic
in that window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rdimitrov rdimitrov merged commit 0a97e59 into main Apr 28, 2026
5 checks passed
@rdimitrov rdimitrov deleted the promote-prod-v1.7.3 branch April 28, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant