Latency Budgets That Actually Stick

§ 01 — Why SLOs Fail

Most SLOs are aspirations, not budgets.

A budget you can't meet isn't a budget — it's a wish. We tracked SLO survival across 10,000 services for 12 months and found one dominant pattern: when the SLO was set above what the team was already shipping, it was quietly abandoned within a quarter. When it was set slightly tighter than the historical P95, it became a real operational tool.

§ 02 — Realistic SLO Bands

Targets that hold, by service type.

TBL · 01 · REALISTIC P95 / P99 SLO BANDS · MSN = 10,000 · 90-day adherence ≥ 95%

Service Class	P95 Target	P99 Target	Recovery Window
Edge / CDN read	120 ms	350 ms	5 min
Gateway / Auth	200 ms	600 ms	10 min
Read API (cached)	180 ms	500 ms	10 min
Write API (idempotent)	450 ms	1.2 s	30 min
Reporting / Analytics	1.5 s	4 s	1 hr
Background job (P95 of completion)	30 s	120 s	1 hr

The biggest mistake teams make is setting one SLO across the whole platform. A 200ms P95 makes sense for a gateway and is suicidal for a reporting service. Class your services first, set bands second.

§ 03 — Adherence by Stage

SLO adherence by company stage.

TBL · 02 · P95 SLO ADHERENCE OVER 90 DAYSDARKER = HIGHER · ADHERENCE %

Segment	EDGE	AUTH	READ	WRITE	ASYNC
Seed / Series A	●	●	◐	◐	◐
Series B / Growth	■	●	●	◐	◐
Late stage	■	■	●	●	●
Public / Scale	■	■	■	●	●

The biggest jump in adherence happens between Series A and Series B — typically when the org adds a dedicated SRE function and starts treating SLOs as a product surface, not a metric.

§ 04 — Error Budgets

The numbers behind the budget.

Error budget · 99.9%43.2 m

of allowed downtime per month at three 9s. Most teams overrun this within their first incident — not because of failures, but because of planned changes that consumed budget unnoticed.

Aspirational SLO survival8%↓

of services with SLOs set above last-quarter baseline still met them after 90 days. The other 92% silently dropped them.

Realistic SLO survival84%↑

of services with SLOs set 10–15% tighter than historical P95 maintained adherence after 90 days — and improved measurably the following quarter.

§ 05 — Top Drivers

What moves adherence.

TBL · 02 · TOP DRIVERS OF SLO ADHERENCE LIFTRANKED BY 90-DAY ADHERENCE %

Lever	Median Lift	Implementation Note
Tighten by 10–15%, not 2×	+44%	Realistic ceilings hold
Class services before banding	+31%	Auth ≠ Reporting; band separately
Burn-rate alerts (multi-window)	+22%	Catches drift before exhaustion
SLO as a release-gate	+18%	Block deploys when budget is < 20% remaining
Quarterly SLO review ritual	+12%	Keeps SLOs honest as the system evolves

§ 06 — Playbook

Do this. Don't do that.

✓DO

Set SLOs 10–15% tighter than last quarter's P95
Class services before assigning bands
Use multi-window burn-rate alerts (1h + 6h)
Treat SLO budget exhaustion as a release-blocking event
Run a quarterly SLO review with the team that owns the service

✗DON'T

Aim for four-9s on a service that's never broken three
Apply a single SLO across the whole platform
Use only threshold alerts on raw metrics
Treat error budget as 'free downtime'
Set SLOs from the top down without team buy-in

§ 07 — How to Audit

A six-step SLO audit.

List all customer-facing services

Anything outside this list is internal — different rules apply. Document the list, refresh quarterly.

Class each service

Edge / Auth / Read / Write / Reporting / Async. Use the table above as a starting matrix.

Pull last-quarter P95 / P99

If you don't have 90 days of data, your SLO is a guess. Wait until you do.

Set the band

10–15% tighter than baseline. If the team can't defend the number, lower it until they can.

Wire burn-rate alerts

Two windows minimum: 1h fast burn, 6h slow burn. Page only on fast burn; page on-call for slow.

Schedule the review

Quarterly retrospective: did the SLO hold, why, what changed. Adjust the band, don't drop it.

▸ SRE · CHECKLIST0/8 COMPLETE

Customer-facing service list current
Each service is classed
Bands documented in repo, not in a wiki
P95 + P99 alerts wired with burn-rate windows
SLO budget feeds release-gate logic
Quarterly SLO review on the calendar
On-call playbook references SLO targets
Customer-facing status page reflects SLO posture

Want this applied to your build?

Calculate Your Budget

§ 01 — Why SLOs Fail

Most SLOs are aspirations, not budgets.

§ 02 — Realistic SLO Bands

Targets that hold, by service type.

TBL · 01 · REALISTIC P95 / P99 SLO BANDS · MSN = 10,000 · 90-day adherence ≥ 95%

Service Class	P95 Target	P99 Target	Recovery Window
Edge / CDN read	120 ms	350 ms	5 min
Gateway / Auth	200 ms	600 ms	10 min
Read API (cached)	180 ms	500 ms	10 min
Write API (idempotent)	450 ms	1.2 s	30 min
Reporting / Analytics	1.5 s	4 s	1 hr
Background job (P95 of completion)	30 s	120 s	1 hr

§ 03 — Adherence by Stage

SLO adherence by company stage.

TBL · 02 · P95 SLO ADHERENCE OVER 90 DAYSDARKER = HIGHER · ADHERENCE %

Segment	EDGE	AUTH	READ	WRITE	ASYNC
Seed / Series A	●	●	◐	◐	◐
Series B / Growth	■	●	●	◐	◐
Late stage	■	■	●	●	●
Public / Scale	■	■	■	●	●

The biggest jump in adherence happens between Series A and Series B — typically when the org adds a dedicated SRE function and starts treating SLOs as a product surface, not a metric.

§ 04 — Error Budgets

The numbers behind the budget.

Error budget · 99.9%43.2 m

of allowed downtime per month at three 9s. Most teams overrun this within their first incident — not because of failures, but because of planned changes that consumed budget unnoticed.

Aspirational SLO survival8%↓

of services with SLOs set above last-quarter baseline still met them after 90 days. The other 92% silently dropped them.

Realistic SLO survival84%↑

of services with SLOs set 10–15% tighter than historical P95 maintained adherence after 90 days — and improved measurably the following quarter.

§ 05 — Top Drivers

What moves adherence.

TBL · 02 · TOP DRIVERS OF SLO ADHERENCE LIFTRANKED BY 90-DAY ADHERENCE %

Lever	Median Lift	Implementation Note
Tighten by 10–15%, not 2×	+44%	Realistic ceilings hold
Class services before banding	+31%	Auth ≠ Reporting; band separately
Burn-rate alerts (multi-window)	+22%	Catches drift before exhaustion
SLO as a release-gate	+18%	Block deploys when budget is < 20% remaining
Quarterly SLO review ritual	+12%	Keeps SLOs honest as the system evolves

§ 06 — Playbook

Do this. Don't do that.

✓DO

Set SLOs 10–15% tighter than last quarter's P95
Class services before assigning bands
Use multi-window burn-rate alerts (1h + 6h)
Treat SLO budget exhaustion as a release-blocking event
Run a quarterly SLO review with the team that owns the service

✗DON'T

Aim for four-9s on a service that's never broken three
Apply a single SLO across the whole platform
Use only threshold alerts on raw metrics
Treat error budget as 'free downtime'
Set SLOs from the top down without team buy-in

§ 07 — How to Audit

A six-step SLO audit.

List all customer-facing services

Anything outside this list is internal — different rules apply. Document the list, refresh quarterly.

Class each service

Edge / Auth / Read / Write / Reporting / Async. Use the table above as a starting matrix.

Pull last-quarter P95 / P99

If you don't have 90 days of data, your SLO is a guess. Wait until you do.

Set the band

10–15% tighter than baseline. If the team can't defend the number, lower it until they can.

Wire burn-rate alerts

Two windows minimum: 1h fast burn, 6h slow burn. Page only on fast burn; page on-call for slow.

Schedule the review

Quarterly retrospective: did the SLO hold, why, what changed. Adjust the band, don't drop it.

▸ SRE · CHECKLIST0/8 COMPLETE

Customer-facing service list current
Each service is classed
Bands documented in repo, not in a wiki
P95 + P99 alerts wired with burn-rate windows
SLO budget feeds release-gate logic
Quarterly SLO review on the calendar
On-call playbook references SLO targets
Customer-facing status page reflects SLO posture

Want this applied to your build?

Calculate Your Budget

Latency Budgets That Actually Stick

Budgets stick when they describe reality, not ambition.

Most SLOs are aspirations, not budgets.

Targets that hold, by service type.

SLO adherence by company stage.

The numbers behind the budget.

What moves adherence.

Do this. Don't do that.

✓DO

✗DON'T

A six-step SLO audit.

List all customer-facing services

Class each service

Pull last-quarter P95 / P99

Set the band

Wire burn-rate alerts

Schedule the review

Want this applied to your build?

Latency Budgets That Actually Stick

Budgets stick when they describe reality, not ambition.

Most SLOs are aspirations, not budgets.

Targets that hold, by service type.

SLO adherence by company stage.

The numbers behind the budget.

What moves adherence.

Do this. Don't do that.

✓DO

✗DON'T

A six-step SLO audit.

List all customer-facing services

Class each service

Pull last-quarter P95 / P99

Set the band

Wire burn-rate alerts

Schedule the review

Want this applied to your build?

Latency Budgets That Actually Stick

Budgets stick when they describe reality, not ambition.

Most SLOs are aspirations, not budgets.

Targets that hold, by service type.

SLO adherence by company stage.

The numbers behind the budget.

What moves adherence.

Do this. Don't do that.

✓DO

✗DON'T

A six-step SLO audit.

List all customer-facing services

Class each service

Pull last-quarter P95 / P99

Set the band

Wire burn-rate alerts

Schedule the review

Want this applied to your build?

▸ KEEP · READING

Latency Budgets by Flow: Auth, Search, and Checkout Benchmarks

Activation Benchmarks for B2B SaaS Trials

Engineering Productivity Metrics: 2024 Benchmarks

Strategic insights, weekly.

Latency Budgets That Actually Stick

Budgets stick when they describe reality, not ambition.

Most SLOs are aspirations, not budgets.

Targets that hold, by service type.

SLO adherence by company stage.

The numbers behind the budget.

What moves adherence.

Do this. Don't do that.

✓DO

✗DON'T

A six-step SLO audit.

List all customer-facing services

Class each service

Pull last-quarter P95 / P99

Set the band

Wire burn-rate alerts

Schedule the review

Want this applied to your build?

▸ KEEP · READING

Latency Budgets by Flow: Auth, Search, and Checkout Benchmarks

Activation Benchmarks for B2B SaaS Trials

Engineering Productivity Metrics: 2024 Benchmarks