As you walk the DevOps Transformation journey, you would build out success stories, build metrics and start to energise teams towards continuing improvements. But to quantify the end user experience, I always look towards the MTTR (Mean Time To Recover) metric.
MTTR is defined as – Average time required to repair a failed component or device. ITIL definitions can be more expressive.
Why MTTR is so useful and is my favourite metric?
Here are few of my reasons –
- MTTR captures the End user EXPERIENCE, by capturing when a service goes down and when it is restored.
- It shows the SPEED at which your team/organisation works!! Including how quickly the team –
- Acknowledges the problem
- Solves the problem
- Communicates the Resolution to the end user.
- MTTR encapsulates the internal dynamics of the teams /organisation.
- It is a simple metric and easy to understand metric, without any ambiguity.
- It can be measured in any unit (hours/days), which everyone can understand, including the Dev and ops.
- MTTR can be captured easily, automated and put across in the dashboard showing trends.
- It is applicable across all systems, of varying complexity and size.
- MTTR is technology agnostic, and can be understood by everyone – management, executives, support, operate and developers.
You do not want to measure anything, unless it helps the teams/stakeholders, but sometimes you may get carried away to the other extreme of measuring everything also. But MTTR is a simple, easy to understand, easy to capture metric, which serves the purpose of showing the inefficiencies and reminding the teams of the end user experience every time!
So what has been your favourite metric? feel free to share your feedback in the comments below.
If you like what you read here, then do share this article, and subscribe to my future articles.