5 · Observability
Objective — make every failure loud: extend the basic Sentry DSN into cron monitors, performance, and release health; add rotated + structured logs; expose a health endpoint with external uptime monitoring; and define alert escalation backed by a quarterly restore drill.
Background
Section titled “Background”With the app hardened and backups landing off-server, make every failure loud. Each section here is a “should,” not a “must” — but a failed backup that screams beats one that fails silently for weeks. These extend the basic Sentry DSN + incident runbook from the Phase 4 deploy pipeline; they don’t recreate them.
1. Full Sentry — crons, performance, release health
Section titled “1. Full Sentry — crons, performance, release health”Confirm the SDK + DSN are already in place (Phase 3 installed the SDK, Phase 4 set the production DSN). Everything below is what comes after that — do not re-create the project or re-install the SDK.
-
Wrap each critical scheduled command in a cron monitor. Without heartbeats, a failed
backup:runfails silently.app/Console/Kernel.php $schedule->command('backup:run --only-db')->dailyAt('02:00')->sentryMonitor('daily-db-backup');// Sentry's free tier = ONLY 1 cron monitor (each extra bills ~$0.78/mo). Wrap just the// single most-critical job — the daily DB backup — and leave the rest unmonitored here:$schedule->command('backup:run')->weeklyOn(0, '02:30');$schedule->command('backup:monitor')->dailyAt('04:00');$schedule->command('queue:prune-batches --hours=48')->daily();- ✅ The one monitor auto-registers in Sentry → Crons on first run; expect a green checkmark within 24 h. Free tier = 1 monitor, so wrap only the daily DB backup (wrapping all four silently bills ~$28/yr). MonSpark (section 3) can cover the rest — pick one, not both.
-
Set performance sampling + PII stripping in
.env.SENTRY_TRACES_SAMPLE_RATE=0.1 # 10% of requestsSENTRY_PROFILES_SAMPLE_RATE=0.1 # 10% profiling on traced requestsSENTRY_SEND_DEFAULT_PII=false # GDPR: strip IP + email by default- ✅ Traces/profiles sample at 10% and default PII is stripped.
-
Tag each release with the git SHA so a regression alert points at the commit that caused it. In
config/sentry.phpset'release' => env('SENTRY_RELEASE')and'environment' => env('APP_ENV'), then setSENTRY_RELEASEin a Deployer task (git rev-parse --short HEAD) and finalize the release withsentry-cli(releases new→set-commits --auto→finalize). Generate the auth token (👤) at Sentry → Settings → Auth Tokens (scopeproject:releases); storeSENTRY_ORG/SENTRY_PROJECT/SENTRY_AUTH_TOKENin the servershared/.env.- ✅ Releases are tagged with the SHA and finalized via
sentry-cli.
- ✅ Releases are tagged with the SHA and finalized via
-
Upload JS source maps (only if you ship a bundle). If the frontend has a build step (Vite/Webpack), add
sentry-cli sourcemaps uploadto the productionnpm run buildflow withSENTRY_AUTH_TOKENset in CI. Skip this entirely for Blade-only apps.- ✅ JS errors show readable stack traces (or this is skipped for a Blade-only app).
2. Log aggregation + rotation
Section titled “2. Log aggregation + rotation”Keep a rotated daily channel plus a sentry_logs channel so warnings/errors flow to Sentry.
-
Confirm the log channels exist in
config/logging.php(the vendor or Phase 3 may already have them).'daily' => ['driver' => 'daily', 'path' => storage_path('logs/laravel.log'),'level' => env('LOG_LEVEL', 'debug'), 'days' => env('LOG_DAILY_DAYS', 14)],'security' => ['driver' => 'daily', 'path' => storage_path('logs/security.log'), 'level' => 'info', 'days' => 30],'payments' => ['driver' => 'daily', 'path' => storage_path('logs/payments.log'), 'level' => 'info', 'days' => 90],'sentry_logs' => ['driver' => 'sentry', 'level' => env('SENTRY_LOG_LEVEL', 'warning'), 'bubble' => true],- ✅ The
daily,security,payments, andsentry_logschannels are defined.
- ✅ The
-
Set the per-environment log stack + level.
Env LOG_STACKLOG_LEVELLocal singledebugProduction / staging daily,sentry_logswarning- ✅ Production runs
LOG_STACK=daily,sentry_logsatwarning, so warnings reach Sentry.
- ✅ Production runs
Prefer structured logging — Log::channel('payments')->info('Stripe webhook', ['event' => $event->id]) — so fields stay searchable. On servers with sudo, add a logrotate.d rule (daily, rotate 14, compress); on shared hosting, Laravel’s daily driver handles rotation. For centralized shipping, the sentry_logs channel (free with your existing Sentry plan) is the zero-new-vendor default; add Better Stack / Papertrail / Axiom only if volume exceeds Sentry’s quota.
3. Health endpoint + uptime monitoring
Section titled “3. Health endpoint + uptime monitoring”Expose a health check (or use Laravel 11+‘s built-in /up), then point an external monitor at it.
-
Add a
/healthroute toroutes/web.phpif missing.use Illuminate\Support\Facades\{DB, Cache, Route};Route::get('/health', function (\Illuminate\Http\Request $request) {$checks = ['app' => true, 'db' => false, 'cache' => false];try { DB::connection()->getPdo(); $checks['db'] = true; } catch (\Throwable $e) {}try { Cache::put('hc', '1', 10); $checks['cache'] = Cache::get('hc') === '1'; } catch (\Throwable $e) {}$healthy = !in_array(false, $checks, true);// Detailed subsystem breakdown only with the monitor token; anonymous// callers get a bare status so /health can't fingerprint internals.$token = config('services.healthcheck.token');$authed = $token && hash_equals($token, (string) $request->query('token'));$body = $authed ? ['status' => $healthy ? 'healthy' : 'unhealthy', 'checks' => $checks]: ['status' => $healthy ? 'healthy' : 'unhealthy'];return response()->json($body, $healthy ? 200 : 503);})->middleware('throttle:30,1'); // 30 req/min/IP — a hit-the-DB route must be rate-limited- ✅
/healthreturns 200 (bare{status}) to anonymous callers, 503 when down, fullchecksonly with?token=matchingHEALTHCHECK_TOKEN, and is throttled to 30/min so it can’t be used as a DB-DoS amplifier.
- ✅
-
Pick one external monitor (👤) and point it at the app every 1–5 min (defer this to production).
Provider Free tier Interval Best for UptimeRobot 50 monitors 5 min Most monitors for free HetrixTools 15 monitors 1 min Uptime + server RAM/CPU agent MonSpark 2–4 monitors 1 min All-in-one: status page, cron monitor, phone calls Better Stack 10 monitors 3 min On-call rotation + incident timeline - ✅ External monitors watch homepage +
/health(P0, 1–5 min), login path (P1, 5 min), and SSL-cert + DNS expiry (P1, daily). Optionally publish a public status page atstatus.[DOMAIN].
- ✅ External monitors watch homepage +
4. Alert escalation + restore drills
Section titled “4. Alert escalation + restore drills”Define who gets paged, how, and when — routed through one shared webhook config (the same one the backup and Sentry alerts use), never hardcoded URLs.
flowchart TD EV["Alert fires"] --> SEV{Severity} SEV -->|P0| P0["SMS + phone + Slack #alerts-p0<br/>5 min · 24/7"] SEV -->|P1| P1["Slack #alerts-p1 + email<br/>30 min · business hours"] SEV -->|P2| P2["Slack team channel<br/>4 hr"] SEV -->|P3| P3["Weekly email digest"]-
Document the severity → channel matrix, routed via the shared webhook config.
Severity Example trigger Channel P0 Cron monitor fails · error rate > 10× baseline · homepage/health down > 2 min SMS + phone + Slack P1 New unresolved production issue · warning-level log pattern Slack + email P2 P95 latency regression > 2× baseline Slack team channel P3 Crash-free sessions < 99% Email digest - ✅ Each severity maps to a channel, routed through one shared webhook config.
-
Create these five rules in Sentry → Alerts (👤). The generic matrix above defines who gets paged; these are the concrete Sentry rules that fire the P0–P3 signals.
Severity Trigger (create in Sentry → Alerts) Channel P0 Cron monitor fails (any) SMS + phone + Slack #alerts-p0P0 Error rate > 10× the 1 h baseline Slack #alerts-p0+ emailP1 Any new unresolved issue in productionSlack #alerts-p1+ emailP2 P95 transaction duration regression > 2× baseline Slack team channel P3 Release health: crash-free sessions < 99% Email digest - ✅ All five Sentry alert rules are active and routed through the shared webhook config.
-
Schedule the quarterly restore drill and capture the runbook.
- ✅ A quarterly restore drill is scheduled, the first drill passed, and provider + bucket, Sentry URL + cron slugs, log retention, uptime monitors, on-call, and next drill date are captured in a
monitoring-state.mdbus-factor document written as if someone else must use it without your help.
- ✅ A quarterly restore drill is scheduled, the first drill passed, and provider + bucket, Sentry URL + cron slugs, log retention, uptime monitors, on-call, and next drill date are captured in a
Whatever the escalation path, the backups behind it still have to be provably restorable:
Checklist
Section titled “Checklist”Do not mark this step done until every box below is checked.
- 🔀 Sentry extended — cron monitors green; performance data visible; releases tagged with the git SHA (👤 auth token generated).
- 🤖 Logs aggregated —
LOG_STACK=daily,sentry_logs+LOG_LEVEL=warningon the server; warnings reach Sentry. - 🔀 Health + uptime wired —
/health(or/up) returns 200 with JSON; external uptime monitors wired (👤, production). - 🤖 Escalation documented — severity → channel matrix routed via the shared webhook config.
- 🔀 Restore drill done — quarterly drill scheduled; first drill passed;
monitoring-state.mdcurrent.