Neon Runbook¶

Service: Neon (canonical cloud Postgres) Trigger: Connection failure, slow queries, or connection pool exhausted Severity: P0 critical (data unavailable) / P1 high (degraded performance)

Steps¶

Connection failure¶

Check the Neon dashboard at https://console.neon.tech for service status or active incidents
Verify the connection string credentials in Infisical EU (eu.infisical.com)
Confirm you are connecting from the host (not via docker exec postgres) — Docker has no outbound internet
Test with: infisical run --env=dev -- python -c "from ops.db_connect import get_connection; c=get_connection(neon=True); print('OK')"
If Neon is down, fall back to local Postgres (staging) — scripts using db_connect do this automatically

Slow queries¶

Connect to Neon via psycopg2 from host
Check active queries: SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC;
Identify long-running queries and assess whether to terminate: SELECT pg_terminate_backend(pid);
Check for missing indexes on frequently queried columns
Review Windmill scheduled jobs — bulk ETL jobs can cause temporary slowdowns

Connection pool exhausted¶

Check current connection count: SELECT count(*) FROM pg_stat_activity;
Check Neon connection limit in dashboard (free tier = 100 connections)
Identify connections held by idle scripts: filter state = 'idle' in pg_stat_activity
Terminate idle connections: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '10 minutes';
Review scripts for connection leaks — ensure get_connection() contexts are closed after use

Rollback¶

Neon supports point-in-time restore from the dashboard. For accidental data deletion, restore to a branch from before the incident and copy the affected rows back.

Contacts¶

On-call: hello@liflode.com
Neon dashboard: https://console.neon.tech
Credentials: Infisical EU, key NEON_DATABASE_URL