Platform admin
Site health & failed jobs
The platform admin header runs seven health checks on every page load.
A green dot means the platform is fully configured; amber or red means
something needs attention. The /admin/jobs/failed page
surfaces the failed-job side of the same picture.
The seven checks
| Check | Severity | What it verifies |
|---|---|---|
| Failed jobs | Critical | Count of rows in failed_jobs. Anything > 0 turns the pill amber; > 50 turns it red. |
| Stripe | Warning | cashier.secret is set. Without it, no plan syncs and no checkout works. |
| LLM provider | Warning | At least one of: Cloudflare account + token, OpenAI key, OpenRouter key. |
| Vector store | Warning | Cloudflare Vectorize index name OR Qdrant URL. |
| Warning | Mail driver + sender configured. Needed for invitations, password reset, lead notifications. | |
| Reverb | Warning | Reverb app key + secret set. Without it, the inbox doesn't update live. |
| Cache | Warning | Redis cache reachable. The hot path caches retrieval results here; a slow cache means slower responses. |
Score & label
The header pill aggregates: score = % of checks passing.
label goes from Strong (≥ 90%) to
Stable (≥ 70%) to Watchlist (≥ 50%)
to Critical (anything below 50%). The pill color
follows: green for Strong, amber for Stable / Watchlist, red for
Critical.
Each notification in the dropdown has a deep link to the relevant config page so you can fix it in two clicks.
Failed jobs
Open /admin/jobs/failed for the full list. Each row shows:
- Job name — the queue + class.
- Connection.
- Failed at — when the last attempt blew up.
- Exception — first line of the trace, expandable.
- Retry — re-queue the job with the same payload.
- Forget — drop the row.
The page also has Retry all and Flush all buttons. Use Retry all after fixing a transient outage (the LLM provider went down; jobs failed; provider is back). Use Flush only when you've decided the failures are unrecoverable.
Common failure shapes
By queue:
- crawl — usually a 4xx/5xx from the upstream site or a Browserless rate-limit. Retry once; if it persists, the source URL is dead.
- index — usually an LLM embedding rate-limit or a Vectorize quota issue. Retry after the rate window resets.
- default — anything else (usage events, gap detection, webhook delivery). Look at the exception.
Webhook delivery failures
The lead-captured dispatcher (SignedDispatcher) is
single-attempt — the lead is already persisted, so a failed
delivery surfaces in the workflow run log rather than blocking the
visitor. Workflow-step webhooks (DispatchWebhookJob)
retry up to 3 times via Laravel's queue retry mechanism; after the
third failure the job lands in failed_jobs with the
destination URL + signature in the payload, so you can replay
manually after the receiver is fixed.
Horizon
If you've enabled Laravel Horizon, /horizon shows real-time
queue throughput, wait times, failed-job rates, and per-job-class
histograms. Recommended to keep open in a tab during deploys.
Site Health pill — what each color means
| Color | Score | Meaning | Action |
|---|---|---|---|
| Green | ≥ 90% (Strong) | Everything healthy. | Nothing. |
| Amber | 50% – 89% (Stable / Watchlist) | One or more warnings; product still functional. | Click the pill, fix what's broken when you have time. |
| Red | < 50% (Critical) | Multiple checks failing or a critical-severity check (e.g. failed-jobs surge). | Drop everything. The product may be visibly broken for some customers. |