It’s time we had a talk… a talk about Time to Resolve and MTTR! Time to Resolve is a metric that has been around for as long as I can remember. While time to resolve (or MTTR) isn’t new, it’s time we took a closer look and unpacked what time to resolve really means.
Why now? Because time to resolve matters… A lot! And time to resolve is, at its core, an indicator of usability and user experience. It also happens to be one of those metrics that is very misunderstood and misinterpreted. So, let’s get into the juice!
In this Time to Resolve guide
What is Time to Resolve?
What do you mean by “time to resolve”?
Time To Resolve (MTTR) gives us the average time it takes for a case or incident with the same symptom during an outage or problem resolution scenario to be resolved:
Time To Resolve = time from incident to resolution time during the time of the outage
In essence, time to resolve is an indicator that tells us how long users are waiting for a specific error. It doesn’t matter if people are having the same issue or experiencing a different problem but rather whether they have encountered something similar before and how long it takes them to get back up and running by finding a solution.
MTTD and MTTR: What’s the diff?
MTTD and MTTR are cut from the same cloth. But they’re not the same! MTTR tells us the time it takes to get a system back up and running while MTTD gives us time from when an incident occurred until it was resolved. For simplicity, we’re going to focus on time to resolve this in this post but I wanted you to be aware of the slight difference between MTTD and MTTR.
MTTD refers to time from incident to time of resolution.
MTTR refers to the time from the incident until the time of resolution.
Example: Time Of Incident (TOI) and Time Of Resolution (TOR)
It’s helpful to think about MTTR like this: Problem occurs at time X, it takes on average Y amount of time for user A to find a solution and then the problem is resolved at time Z.
Also, keep in mind that MTTD means the time from when an incident occurred until resolved while MTTR means the time from when the problem occurred until the time the issue was solved. That might seem trivial but I think we’ve all run into this confusion more than once so I wanted to make sure we were crystal clear.
When should I be measuring Time to Resolve?
Time To Resolve provides us with great data when we need to understand user satisfaction for recurring errors – those with similar symptoms. For example:
– Why was my site down again? This just happened last week! I can’t believe you guys can’t keep this from happening over and over again…
– After updating my browser, I keep getting this message. Could you please have someone take a look at it?
– The instructions you sent were not clear. Can someone take another look at them?
If time to resolve is something that you can measure (and time to resolve is possible for everyone), then time to resolve will help you understand the time it takes your users to get back up and running after an outage. Basically, how long did it take users before they could get their work done again after experiencing the issue?
After all, “time” is money! So let’s see how we can make sure the time to resolve is always available for use …
Over time to resolve allows us to calculate the time your users are waiting for an issue to be resolved. We can then compare that data over time and across different teams. There are several other key performance indicators (KPI’s) that are used to determine if there are problems in this area – time to resolve is one of them!
How do I measure time to Resolve?
First of all, time To Resolve needs to be measured in order to be used. Time To Resolution can measure in one of three ways: (1) time from incident to time of resolution (2) time during the time of the outage, (3) time from incident to time of closure time. There are advantages and disadvantages to each method, as well as how it may be reported so keep those things in mind when choosing a method for your team. Let’s look at these three below.
1.) Time from incident to time of resolution: This is a great metric for teams who want a quick turnaround because you can pull data quickly after an outage has occurred. One downside is that you do lose some historical data if something has not yet been resolved within 72 hours (which was once considered good practice) but this speed bumps over longer periods of time which could help with time to report. The time to resolution during the time of outage is one of our most popular times to resolve metrics!
2.) Time from incident to time of closure time: Here’s an example of time from incident to time or closure time where you can see that there were 100 minutes between when the problem started and when it came time for the user in this particular ticket to get back up and running. This method does take longer than just pulling data right after an outage, but it gives us much more historical data over a longer period of time. This data can be analyzed across different teams in order to really drill down into what makes the longest (and why).
3.) Time from problem identification time until closure time: Here is another example of time from problem identification time until closure time. This time you can see that it took 50 minutes for the user to get their issue resolved, and it’s also clear that this time is pulling data from several days ago as opposed to just one. Remember, this is harder to pull than the other methods as issues tend to be updated as they are progressing through your workflow – so keep in mind we do need at least 24 hours between updates before we will capture any time to resolution data.
Typically Time To Resolution (measurement) falls into one of these three categories:
– during outage time (time data is collected includes downtime and time it takes staff to address the problem)
– after outage time (days or weeks after time data is collected)
– time in queue time (start time in the system until time of closure time in the system).
Your users are waiting, what now?
Here’s where time to resolution gets tricky! Remember that “time” is money so it’s important that you take your time to resolve data and turn it into something useful. Some questions for you might include:
– How big really is this problem?
– What does this problem look like over time? See any trends developing?
– Is my team improving or becoming worse at resolving issues quickly?
Taking your time to resolve data and turning it into something useful can be hard – sometimes teams will have so much information but they know they can’t make changes without time to resolution data. By pulling time to resolution data, you will be able – not only to compare yourself to your industry peers but also to see how efficiently your time is being used in this area!
Tips for decreasing Time to Resolution and MTTR
So, you know what is Time to Resolution, how to measure and track it. Now, you want to decrease this all-important metric. Here are our favorite tips!
– Do not be afraid to stop doing something that is time-consuming but unimportant. If your time is being spent on a lot of things that are simply adding time and not adding value then it may be time to reassess those activities.
– Start using (or continuing with) an observability platform or an incident management system like Pagerduty or alternatives where there is less room for error as opposed to email or spreadsheets if this is how you currently track incidents. Email can get lost or forgotten, spreadsheets become messy and confusing and ultimately time will continue to tick away as you go back and forth trying to resolve issues.
– Always prioritize your work – start with the most important tasks first since they will take the longest time and then prioritize the time it takes to complete a task for time savings.
– Other ways time to resolution can be improved include: having a clear process that is easy to follow, making sure your team is trained appropriately at every level, and always knowing what tickets are in the queue – this way you know exactly how many users are waiting on you!
Time to Resolution and MTTR final thoughts
Time To Resolution is a great indicator of the quality of IT work. It’s used as an indication that problems are being solved correctly and with time management in mind. Having an efficient time to resolution rate means that issues are being resolved quickly enough for teams to address problems but not too quickly where they’re considered “fire fighting” vs really getting down into the root cause analysis of the problem. A good time to resolve KPI helps teams answer questions about customer service, efficiency, time spent on tasks among others so it’s important to really not just “focus” on time to resolution but also understand what it means for your team!