Most MSP SLAs are marketing documents. They have numbers in them, but they're written so vaguely that the MSP can almost never be held to them. Here's how to require SLA terms that actually mean something.
What SLA Levels Should Cover
A business-grade MSP SLA should define four severity levels with specific response and resolution targets:
- P1 — Critical: Complete system outage, active security incident (ransomware, breach), or multiple users unable to work. Response: 15–30 minutes. Work begins immediately and continues without stopping until resolved. Available 24/7/365.
- P2 — High: Major function unavailable, many users affected, significant productivity impact. Response: 1–2 hours. Same-business-day resolution target.
- P3 — Medium: Single user affected, workaround available, moderate impact. Response: 4 hours. Next-business-day resolution.
- P4 — Low: Minor issue, no productivity impact, general request. Response: 1 business day. 3–5 business day resolution.
How "Response" Should Be Defined
This is where SLAs get manipulated. "Response" means a technician contacts you with a plan of action — not an automated ticket acknowledgment email, not a "we're looking into it" message, not being placed on hold. Require the SLA to define response as: "a qualified technician contacts the reporting user via phone, chat, or email with an initial diagnosis and expected resolution timeline."
How "Resolution" Should Be Defined
Resolution means the issue is fixed, not closed. Some MSPs close tickets with "user reports issue resolved" when the issue isn't actually resolved — just not currently active. Require resolution to mean: the root cause has been identified and addressed, or the workaround is documented and a permanent fix is scheduled with a committed date.
After-Hours Coverage Requirements
Ask specifically: "What is your after-hours coverage for P1 issues?" Acceptable answers:
- Dedicated on-call staff with a pager/cell number that connects to an on-call technician
- 24/7 security operations center (SOC) monitoring with escalation to on-call for P1
Not acceptable:
- A voicemail box that someone checks in the morning
- "We monitor alerts and someone will call you back" (with no defined response time)
- Email-only after-hours contact
SLA Measurement Methodology
How will SLA performance be measured, and who measures it? The MSP shouldn't be grading their own homework. Require:
- Monthly SLA reports you receive automatically — not on request
- Ticket timestamps that you can verify independently through the ticketing system
- Definition of excluded time (planned maintenance windows, issues caused by third parties, client-caused delays) — and a limit on how often exclusions can be claimed
What the Remedy Should Be
An SLA without a remedy is a marketing statement. Require a specific remedy for SLA misses:
- P1 SLA miss (failure to respond within 30 minutes to a critical outage): service credit of 1 day's monthly fee per incident
- Pattern of P2/P3 misses (e.g., SLA compliance below 95% in any month): service credit of 5% of monthly fee per month below threshold
- Repeated SLA failures (two consecutive months below threshold): right to terminate with 30 days notice and no early termination penalty
If an MSP refuses to add remedy language, ask why. The usual answer is "we don't miss SLAs." Push back: if they never miss them, the remedy clause costs them nothing. The resistance to including it tells you something about their confidence in their own delivery.
How to Verify SLA Performance After You Sign
Track every ticket yourself for the first 90 days: when it was opened, when you received first contact from a technician, and when it was resolved. At 90 days you'll have real data on their actual performance vs. their contracted SLA. This is also the information you need if you ever want to invoke the remedy clause.