Windmill Runbook¶
Service: Windmill (app.windmill.dev, workspace: liflode) Trigger: Job failure, schedule miss, or flow error Severity: P1 high (scheduled ETL/ops jobs) / P2 medium (non-critical enrichment)
Steps¶
Job failure¶
- Log in to Windmill at https://app.windmill.dev
- Navigate to Runs and filter by status = Failed
- Open the failing run and check the full execution log for the error message
- Common causes: Infisical secret unavailable, Neon connection timeout, upstream API rate limit
- Fix the underlying cause and trigger a manual run to verify
Schedule miss (job did not run at expected time)¶
- Navigate to the flow/script in Windmill
- Check the Schedule tab and verify the cron expression is correct (AEST = UTC-10/11)
- Verify the schedule is enabled (toggle active)
- Trigger a manual run to confirm the job still executes successfully
- If the schedule was disabled automatically after repeated failures, fix the error first then re-enable
Flow error (step-by-step execution issue)¶
- Open the failed flow run in Windmill
- Use the step-by-step execution view to identify which step failed
- Check the step's input/output values to diagnose the issue
- Fix the script for that step and re-test with a manual trigger
Rollback¶
Windmill scripts are versioned — revert to a previous version via the script editor's version history. For data side effects, check the relevant database table and correct manually.
Contacts¶
- On-call: hello@liflode.com
- Windmill dashboard: https://app.windmill.dev
- Windmill schedule reference: liflode-docs/rules/services.md
Related ADRs¶
- ADR-043: Windmill as scheduled job runner
- ADR-062: Windmill scheduling conventions