Skip to content

Windmill Runbook

Service: Windmill (app.windmill.dev, workspace: liflode) Trigger: Job failure, schedule miss, or flow error Severity: P1 high (scheduled ETL/ops jobs) / P2 medium (non-critical enrichment)

Steps

Job failure

  1. Log in to Windmill at https://app.windmill.dev
  2. Navigate to Runs and filter by status = Failed
  3. Open the failing run and check the full execution log for the error message
  4. Common causes: Infisical secret unavailable, Neon connection timeout, upstream API rate limit
  5. Fix the underlying cause and trigger a manual run to verify

Schedule miss (job did not run at expected time)

  1. Navigate to the flow/script in Windmill
  2. Check the Schedule tab and verify the cron expression is correct (AEST = UTC-10/11)
  3. Verify the schedule is enabled (toggle active)
  4. Trigger a manual run to confirm the job still executes successfully
  5. If the schedule was disabled automatically after repeated failures, fix the error first then re-enable

Flow error (step-by-step execution issue)

  1. Open the failed flow run in Windmill
  2. Use the step-by-step execution view to identify which step failed
  3. Check the step's input/output values to diagnose the issue
  4. Fix the script for that step and re-test with a manual trigger

Rollback

Windmill scripts are versioned — revert to a previous version via the script editor's version history. For data side effects, check the relevant database table and correct manually.

Contacts

  • On-call: hello@liflode.com
  • Windmill dashboard: https://app.windmill.dev
  • Windmill schedule reference: liflode-docs/rules/services.md
  • ADR-043: Windmill as scheduled job runner
  • ADR-062: Windmill scheduling conventions