Visibility of Work and the Second Way

Second way DevOps focuses on the creation and improvement of feedback loops. One valuable benefit to better feedback loops is better visibility of work. If each part of the pipeline is more responsive, teams in both development and operations can react and implement changes more quickly. A key element of Agile software development is its adaptability and capability of responding to changes. Second way DevOps makes this facet of Agile the top priority, and thus results in a more reactive process. With improved feedback loops, work is more visible. Greater visibility of work improves each individual area and component of the pipeline.

Creating Telemetry

Telemetry is the process of observing and recording the behaviour of a product or system. In short, it is a type of monitoring. In any DevOps environment, telemetry is valuable for a number of reasons. First, telemetry is more consistent and reliable in recognising unusual behaviours. Team members may not always notice when a product’s behaviour deviates from what is normal or expected. Since telemetry is specifically designed for a certain product, it will always record behaviours that differ from what the monitoring is set to recognise. Programmed monitoring does not tire from repetitive tasks, and will catch mistakes just as accurately even after a product runs for hours. Telemetry works against certain metrics and values, and thus operates with a black and white definition of what counts as unusual behaviour.

In addition to reliability, telemetry also makes problems more visible. Human monitoring requires the user to notice the problem when it happens, as well as remember the steps they executed to produce the problem. Monitoring routines can record the variables and state of a product when the problem occurs, as well as maintain a log of what steps led to the issue. This trail of steps allows team members to later reproduce the issue. Furthermore, it gives the development team more clues to determine exactly what caused the issue. Instead of relying on user description, proper telemetry gives a recording of exactly what happened leading up to the error. If the problem occurs multiple times in different scenarios, these logs will allow the team to find common threads across each instance, and identify where the problem lies more quickly.

Application Logging

A popular and useful implementation of telemetry is application logging. While application logging can be incorporated in a number of different ways, the general practice is achieved when developers write code into a product to record certain context information in a log of some sort, in order to gain more insight when problems occur. This can be a local file on a customer’s computer, a web-based recording tool, or a bug report that can be captured during program execution. Whatever the technique, application logging allows the DevOps team to gain more insight on a product while running in a live environment, and improve the production pipeline by extension.

One important rule for useful application logging is to record enough valuable information while not recording so much information as to overwhelm the team or negatively impact product performance. For some teams, logging the call stack may be sufficient. If the team can see what routines the application accessed before and during the error, they at least know where the problem occurred. This may not be a definite indicator of what exactly caused the issue, but the sequence of calls leading up to the error gives valuable context information. Furthermore, the call stack might show that the application runs a routine that developers did not anticipate. Such unexpected information might allow developers to improve efficiency by removing these unnecessary calls, or identifying wasteful code that might pose a risk for the integrity of the software product.

Back