Neon Runbook¶
Service: Neon (canonical cloud Postgres) Trigger: Connection failure, slow queries, or connection pool exhausted Severity: P0 critical (data unavailable) / P1 high (degraded performance)
Steps¶
Connection failure¶
- Check the Neon dashboard at https://console.neon.tech for service status or active incidents
- Verify the connection string credentials in Infisical EU (eu.infisical.com)
- Confirm you are connecting from the host (not via
docker exec postgres) — Docker has no outbound internet - Test with:
infisical run --env=dev -- python -c "from ops.db_connect import get_connection; c=get_connection(neon=True); print('OK')" - If Neon is down, fall back to local Postgres (staging) — scripts using
db_connectdo this automatically
Slow queries¶
- Connect to Neon via psycopg2 from host
- Check active queries:
SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC; - Identify long-running queries and assess whether to terminate:
SELECT pg_terminate_backend(pid); - Check for missing indexes on frequently queried columns
- Review Windmill scheduled jobs — bulk ETL jobs can cause temporary slowdowns
Connection pool exhausted¶
- Check current connection count:
SELECT count(*) FROM pg_stat_activity; - Check Neon connection limit in dashboard (free tier = 100 connections)
- Identify connections held by idle scripts: filter
state = 'idle'in pg_stat_activity - Terminate idle connections:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '10 minutes'; - Review scripts for connection leaks — ensure
get_connection()contexts are closed after use
Rollback¶
Neon supports point-in-time restore from the dashboard. For accidental data deletion, restore to a branch from before the incident and copy the affected rows back.
Contacts¶
- On-call: hello@liflode.com
- Neon dashboard: https://console.neon.tech
- Credentials: Infisical EU, key NEON_DATABASE_URL