...rants by Asheesh Mehdiratta on Coaching, Transformation and Change

Category: devops

Why MTTR is my favourite metric?

metrics devopsAs you walk the DevOps Transformation journey, you would build out success stories, build metrics and start to energise teams towards continuing improvements. But to quantify the end user experience, I always look towards the MTTR (Mean Time To Recover) metric.

MTTR is defined as – Average time required to repair a failed component or device. ITIL definitions can be more expressive.

Why MTTR is so useful and is my favourite metric?

Here are few of my reasons

  1. MTTR captures the End user EXPERIENCE,  by capturing when a service goes down and when it is restored.
  2. It shows the SPEED at which your team/organisation works!! Including how quickly the team –
    1. Acknowledges the problem
    2. Solves the problem
    3. Communicates the Resolution to the end user.
  3. MTTR encapsulates the internal dynamics of the teams /organisation.
  4. It is a simple metric and easy to understand metric, without any ambiguity.
  5. It can be measured in any unit (hours/days), which everyone can understand, including the Dev and ops.
  6. MTTR can be captured easily, automated and put across in the dashboard showing trends.
  7. It is applicable across all systems, of varying complexity and size.
  8. MTTR is technology agnostic, and can be understood by everyone – management, executives, support, operate and developers.
Conclusion

You do not want to measure anything, unless it helps the teams/stakeholders, but sometimes you may get carried away to the other extreme of measuring everything also. But MTTR is a simple, easy to understand, easy to capture metric, which serves the purpose of showing the inefficiencies and reminding the teams of the end user experience every time!

So what has been your favourite metric? feel free to share your feedback in the comments below.

If you like what you read here, then do share this article, and subscribe to my future articles. 

Takeaways from the DevOps++ Global Summit Conference

 

It was a pleasure to be invited as a Speaker at the DevOps++ Global Summit, and present my views on the Key Success (and Failure) modes for Large Scale DevOps Transformation, based on the patterns that I have seen across multiple organizations. My session slides can be viewed here.

It was also great to be able to hear other speakers and capture some Key Takeaways below:

  1. DevOps is all about positive economic outcome and benefit to the people, though wrapped up in too much technical tooling jargon by eager vendors.
  2. DevOps is more about the mindset, human aspects relationships and only then about methodologies and frameworks.
  3. Automation should be about automating the right thing, the right way, at the right time.
  4. Micro services architecture enables the DevOps ways of working and complement the Cloud architecture, compared to monolith applications.
  5. Lean principles, especially Value Stream Maps, are a key element in identifying the process debt and wastes, which slow down E2E delivery.
  6. Continuous Deployment/Monitoring workshops wrt. popular tooling using Docker, Prometheus, Ansible etc.
  7. Salesforce stack development and deployments aided with Jenkins, Gitlab, Qualitia.

If you attended this conference, do drop in a link to your notes below in the comments.

Subscribe to my blog for more, and feel free to share your feedback here.

5 Tips to integrate Change management in DevOps

 

TalkIn order to reduce operational risks, organizations put in CONTROLS, typically via Change Management processes, which satisfy audit and compliance requirements. These CONTROLS create friction among the team. To minimize this friction,  let us look at 5 Tips to integrate Change management in your DevOps journey.

TIP #1 – TALK TO YOUR AUDIT/COMPLIANCE TEAM

START a conversation with your Audit/Compliance team members now, and try to understand their needs. These conversations will help your team to empathize and see the world from the ‘audit’/’security’ lens. You can then move forward to provide the ‘solution’ instead of jumping in with precooked notions. Read more here on how to start these conversations and ASK the right questions.

TIP #2 – CREATE TRACEABILITY AND CONTEXT FOR YOUR CHANGE SET

START providing the traceability and context for your change set to the operations teams. The goal for your team should be to provide evidence of quality test results for the proposed change set, which will provide the required CONFIDENCE for the Operations team. Read more here on how to start providing this traceability and have a deeper engagement with your Operations teams.

TIP #3 – RE-CLASSIFY YOUR CHANGE SETS

Start to reclassify your change sets, and build agreements, which allow you to auto-deploy to production. Building “standard” change sets, with pre-defined risk profile (low risk first!), you can move towards building a culture of trust with change and operations teams. Read more here on how to reclassify your change sets and increase transparency across the team.

TIP #4 – — USE TELEMETRY TO SHOW EVIDENCE

Start to build out your telemetry systems. These systems allow capturing error, warnings, events, trigger points, and logging this data to central\distributed stores. Use this evidence to show the CONTROLS required for the audit, change management processes. Read more here on how to build these telemetry systems in your DevOps journey.

TIP #5 – AUTOMATE, AUTOMATE, AND AGAIN AUTOMATE !

Stop doing manual steps in your change management teams, STOP ! Start to automate your workflows –build-automated tests – deployments –reports. Read more here on how to increase the automation across the life cycle and increase transparency across the team

So go ahead, kick start and integrate the change management in your DevOps with these tips. Subscribe to my blog for more, and feel free to share your feedback here.

Tip #5: Effective Change management in DevOps

TalkIn order to reduce operational risks, organizations put in CONTROLS, typically via Change Management processes. To minimize the frictions in your DevOps journey, and building on my previous Tip#4, let us look below for the Tip#5 for effective change management.

TIP #5 – Automate, Automate, and AGAIN Automate !

Change management typically includes a CAB (Change Advisory Board) meeting. This CAB meeting reviews the list of change sets, which the operations team filters and can either accept or reject,  to move the change-set into production.

The "Advisory" Board just became the "Gate Keeper", if you noticed!

Large enterprises will work with a traditional mindset. This traditional mindset assumes a Large Batch of changes, which may have been suited in the past for a BIG CAB meeting. But now with smaller change-sets (read as micro services) becoming the norm, imagine going through the rigor of a 2 hour CAB meeting. It will not be a very pleasant experience !!

Therefore we need to re-imagine the CAB meeting and ask the simple but difficult question - WHY DO I NEED A CAB ? 
Purpose of the CAB

Typically the purpose of the CAB meeting is to verify all the artifacts, as below-

  • Have you tested your changes?
  • Have you integrated the security practices?
  • Have you tested the migration, rollback and can provide evidence?
  • Is the change-set linked to the business need and do you have approvals?
  • and the list can go on and on…..
Good news?

But there is good news now !  All these questions can be answered easily by automating your change management workflows. This is now supported by the convergence of the tooling across the application development life cycle and all the evidence required along with the artifacts can be easily built into your automated build and deployment pipelines and integrating with the change management workflows.

Thus the elimination of CAB is done by Automation of our workflows!

These automated workflow makes it possible for the Operations teams to trace the requirements as they become implemented, and improves the ability to see changes, the effects of changes, approvals, and gather the evidence in a self service model using telemetry.

Many teams start with an intermediate manual approval step, till they start to trust the teams, and their change sets. But this manual approval step also goes away, over a period of time depending on the maturity of your teams.

As the teams look to implement pre-approved change strategy, and look for future opportunities to keep on continuously improving and widening this definition, it is a win-win for both sides.

So if you still doing manual reviews in your change management teams, STOP ! Start to automate your workflows, start to automate your build process, start to add automated tests for security into your coding, start to automate your deployments to production, start to add Telemetry and automate the production reports, start to complete the feedback loop, and in the end increase the Transparency across the team.

So go ahead – Automate, Automate and again Automate !

Subscribe to my blog for more learning’s, and feel free to share your feedback here.

Tip #4: Effective Change management in DevOps

TalkIn order to reduce operational risks, organizations put in CONTROLS, typically via Change Management processes. To minimize the frictions in your DevOps journey, and building on my previous Tip#3, let us look below for the Tip#4 for effective change management.

TIP #4 – Use TELEMETRY to show evidence

Traditional thinking auditors look for evidence, and will typically ask for screenshots, configuration logs, settings etc. If you manage thousands of servers, this itself is a cumbersome activity, especially if you are launching and shutting down servers in the cloud all day.

Imagine the activities needed to manage the expectations and you would simply need an army to satisfy the audit needs!

But the auditors and compliance personnel cannot read code, and hence need all the help they can get, to satisfy the regulatory bodies. So you can help them with providing evidence using the following options.

 1. Create alternate data sources to present evidence 

Applications which are ‘operationally-aware’, will include telemetry data, including capturing error, warnings, events, trigger points, and logging this data to central\distributed stores. Typical telemetry systems (MS Insights/Kibana (logstash) ELK stack / Splunk etc.) can capture all this information and present in visual dashboards.  These dashboards can be customized, based on the needs of the users, and present the data at multiple levels of detail.

Auditors can slice and dice this data, and ‘self serve’ their audit needs !

2. Use Iterative approach to building CONTROLS evidence

As part of early engagement with the auditors, successful teams invite audit teams to their sprint planning and sprint reviews. This conversation can kick start rich discussions on how to build controls evidence in every sprint, instead of the end stage.

Teams can start to build controls right from the beginning! 

Sometimes the solutions to meeting the audit controls could be as simple as maintaining version control for all the artifacts. Other solutions could be simply linking all the artifacts across the complete application development life cycle. This allows traceability for each change set put in production.

To help you explore further, read up the fictitious narrative DevOps Audit Defense Toolkit. This provides some real life examples and links it all together.

So go ahead and start to build out your telemetry systems. Start doing early engagements with the change teams. Start to build out the controls iteratively, thereby building Trust and Transparency across the team.

Subscribe for more tips in my next post, and feel free to share your feedback here.

Tip #3: Effective Change management in DevOps

TalkIn order to reduce operational risks, organizations put in CONTROLS, typically via Change Management processes. To minimize the frictions in your DevOps journey, and building on my previous Tip#2, let us look below for the Tip#3 for effective change management.

TIP #3 – re-classify your change sets

Enterprises will have multiple change requests being pushed to production, of varying size, complexity and with different risk profilesBut existing change management processes today do not distinguish between these variations!

In reality, different change sets allow us to build different risk profiles.

So let us try to understand these variations in the Change sets, which can typically be classified into one of these 3 categories –

 1. STANDARD Change sets

These change sets are very low risk, and operations are familiar with these. These change sets have an established approval process in place. Examples – web style changes / data table updates etc.

2. NORMAL Change sets

These change sets are high risk, and operations are not familiar with these. These change sets typically use a CAB (Change Approval Board) process to approve/reject the changes. This process requires submitting change forms, with schedule, impacts, risks etc. Examples -New feature/product etc.

3. HIGH Urgency Change sets

These change sets are emergency changes, with potential high risk, and may need approvals from senior management. Examples -Security patch, Service fix patch etc.

Now with the above classification, we can aim to align with the operations teams and change management and ask for an agreement.
Agreement:  Can the STANDARD Change set be Pre-APPROVED?

As the standard changes sets are low risk => Operations teams do not need to approve. This agreement immediately give us the ability to define a pre- approval process. This allows us to deploy our change sets automatically (using  our automated deployment pipelines).

I am sure that this agreement itself will allow you to breathe more freely !

So go ahead and start working with your change management teams. Start to build these agreements, which allow you to auto-deploy to production, with complete Trust and Transparency across the team.

Subscribe for more tips in my next post, and feel free to share your feedback here.

Tip #2: Effective Change management in DevOps

TalkIn order to reduce operational risks, organizations put in CONTROLS, typically via Change Management processes. To minimize the frictions in your DevOps journey, and building on my previous Tip#1, let us look below for the Tip#2 for effective change management.

TIP #2 – create traceability and context for your change set

Operations do not want to be Surprised by any change!!  When you are working in operations every weekend and having multiple late nights, for supporting servers going down or applications crashing, you really want to know what’s the next patch upgrade going to do and how well it will work on the production box. Operation team members are also human and need to have the same LIFE as development team members. ASK your Operations team the OPS PAIN INDEX

OPS PAIN INDEX = #EXTRA HOURS WORKED x  EXTRA NIGHTS

A higher value for this Ops Pain Index will give you a better understanding of the need for Ops to learn more, about every new change being proposed and it’s impact for deployment on existing running stable system. Talk about building TRUST in the development change set!!

Thus, the key is to INCREASE THE TRUST IN YOUR CHANGE SET, by creating traceability and providing context.

With today’s tools and deployment pipelines, it is easy to link your work items in the planning tools (say JIRA, TFS, Rally…etc.). These typically include features/stories/defects – including the Ticket number, version control checkins, comments, release notes. This input can be easily feed into the deployment pipeline tools (Jenkins, Chef, Puppet etc.). This integrated view provides complete traceability across all stages from requirements to the deployment.

The linkage of the work items to the deployment artifacts describes CHANGE SET and provides the CONTEXT for the Operations team.

The additional evidence from a Quality standpoint is typically available from various channels. This includes automated builds results, automated testing results, showing the test cases executed, pass/fail ratio etc. across the various test stages (unit, integration, regression, performance, security tests). All these provide the additional confidence to the operations teams that the development team has really tested the application.

When the development says it works, they really mean it !

Aim to provide evidence of quality test results for the proposed change set, which will provide the required CONFIDENCE for the Operations team.

So go ahead and start providing the traceability and context for your change set to the operations teams and you will be on your way to building some new OPS friends 🙂

Subscribe for more tips in my next post, and feel free to share your feedback here.

Tip #1: Effective Change management in DevOps

TalkIn order to reduce operational risks, organizations put in CONTROLS, typically via Change Management processes. The outputs typically feed into the compliance/audit personnel needs, and satisfy them, but the legacy audit mindset CONFLICTS with the DevOps team mindset.

Therefore to minimize this friction, see my #1 tip in this post on how to work with the change management processes, and teams. I will be sharing more tips in my next posts.

TIP #1 – TALK to your Audit/Compliance team

  1. ASK – Why does your audit team need the Change information? 
  2. ASK – What will they do with the Change information? 
  3. ASK – What level of granularity of data about the Change is required? 
  4. ASK – Are there alternate sources of the same Change data?
  5. ASK – When do they need this Change information?

Speaking the same language (audit-speak) and asking them questions, will give you as an IT team a better understanding of the Audit/Compliance process. You may be surprised by the technical nature of the various ACTS (Financial \ Healthcare etc.) and start to appreciate them even.

So just go ahead and START a conversation with your Audit/Compliance team members now, and you might be pleasantly surprised.

Subscribe for more tips in my next post, and feel free to share your feedback here.

Try these 3 strategies to FIX your DevOps problems!

In my previous post, we saw the Top 3 DevOps challenges faced by organizations today. So let us review how organizations can address these challenges by leveraging the power of systems thinking, feedback loops and cultural transformation at the core, to claim the real promise of ‘agility’ for the customers and stakeholders





TOP 3 SOLUTIONS


·      Build Ownership


The goal is to foster win-win relationships, where the dev and ops team start thinking as a SINGLE UNIT, responsible for end customer delight! This requires the organizations to align the Goals for both the groups and provide the ‘right’ environment for collaboration.

Organizations which understand systems thinking, can help the Dev and Ops teams visualize the FLOW (from concept to cash), and are able to articulate the importance of cycle time, while error proofing and preventing downstream defects (aka. operational headaches).

These teams typically use Value Stream Maps, to share the areas that slow them down (or identify bottlenecks), while building a shared understanding of the complete end to end system. These exercises allow the teams to build empathy for each other’s roles and share the pains, thereby allowing the silo’d groups to start to trust each other and build better relationships over time.

·         Build Shared Practices


The long divide between Dev and Ops can be bridged by amplifying the feedback loops at every step in the end to end delivery cycle and sharing the knowledgebase and increasing transparency across both the worlds.

Organizations typically start this journey by treating Infrastructure as Code, where there is a single repository of truth and everything is version controlled. The teams start thinking about making each step of the highest quality and incorporating feedback from multiple levels – application data, process data, infrastructure dashboards, and business metrics – to highlight pain points early and design shared solutions around the problems. Refer the diagram below highlighting the areas for embedding and/or extending the teams and crossing the systemic boundaries.

Organizations can be seen experimenting with embedding Ops and Dev team members across each other’s groups, whichallows for increased empathy (example – Design for Operations), learning’s and increased collaboration.
                                                                                        Source: DevOps Patterns Distilled (Velocity London 2012)

·         Build a Learning Culture


The best ways for bridging the cultural gap between dev and ops is to build a learning culture. Organizations which embrace the learning culture are good at communicating a compelling reason for the change (primarily business outcomes), measuring the new behaviours and giving feedback, creating “triggers” in the work environment that remind teams what needs to be done, and building communities (CoP’s) that support this shared learning

The leadership encourages learning from failures, and is happy to conduct experiments and take risks, promoting a healthy culture of constant innovation while aligning team goals and changing human resources policies.


In the end

Dev-Ops is a long journey and it begins with building a “we” culture among the development and operations teams with shared goals and shared incentives. The improved communication and collective ownership fosters an environment of trust, leading to sharing of ideas, tools, processes and everyone focussed on delivering business value at the end of the day.

Let me know what other solutions you have practiced with your DevOps teams.

Top 3 challenges in your DevOps journey

The 2014 State of Devops survey report clearly shows higher organizational performance linked to the performance of the IT group and it’s DevOps practices. But most organizations are still struggling in their IT-DevOps journey  – “only 21% of those familiar with it are using it“. In the DevOps journey the main objective of “Collaboration between the Dev and Ops” faces many challenges! Let me attempt to highlight the Top 3 challenges faced by most organizations.



TOP 3 CHALLENGES 

  • No Shared Ownership


Most programs typically have Development and Operations as separate teams, with conflicting goals.

The top down goals for development teams are to build features (potentially shippable increments) at short regular intervals so that they can be deployed, with all incentives promoting ‘faster’ build cycle, versus the operations team goals favor operational stability with changes minimized, in order to maintain existing system reliability and high availability, with incentives for reducing operational costs.


These conflicting goals setting lead to development teams “handing off” the code to operations after development, and operations “pushing back” almost every time.

The overall impact is that the feature ‘go live’ date is delayed, with both the groups lacking “shared ownership” for reducing the overall feature delivery cycle time from an end customer view point.


  • Physical separation


Development and Operations teams are separated by distance, and mostly do not share the same physical location or work area. Most organizations will have centralized operations teams, possibly across time zones for larger enterprises.


The silo’d physical structure is also carried in the silo’d organizational structures with different reporting heads for both the teams, thus ensuring that local optimizations rule the day, with the Operations team members managing and running multiple applications, in closely guarded areas, with restricted access or interaction opportunities with the Development teams.  


How can you relate to someone whom you have never met face to face and never talked? bye bye collaboration !!


  • Cultural differences


Cultural differences are visible in the behavior and actions of both the development team and the operations teams. 

The lack of trust and transparency on both sides is what manifest in the communication gaps on both sides, with the development team having minimal visibility on deployment activities and feedback on production systems (read  infrastructure metrics), and the Real business metrics and similarly the operations teams having minimal visibility on what is the expectations on the features wrt.  scalability, run books, or reliability that they should care about to maximize the applications potential and operate as expected by the development team. 

The lack of shared evidence and the missing Shared ownership clearly comes out and creates a sense of mistrust and results in overall delivery delays.


“The developer and operations divide in IT is almost like humidity at times. You can’t see it, but you feel it,” – This quote from the Starabucks devops post sums the challenges….  

what are the challenges do you see in your devops journey?


Look out for my next post which will try to address possible solutions for these challenges…

© 2025 agile journeys

Theme by Anders NorénUp ↑