This talk will be held in English. / Dieser Vortrag wird auf Englisch gehalten.
Given enough time and a topic, one has the rare chance to make all the mistakes out there. Indeed, this is precisely what I experienced during my more than two decades of working with "software" and "observability" in primarily distributed systems. I wanted to use the chance to compile and present the worst ones, rank them—for the chronically online—in a tier list, and provide some insight on how to avoid these pitfalls.
At first, we will look at more technical factors: Over- and under-instrumentalization will look at rather toxic alerting practices; we will ask ourselves why we ever used uncorrelated data and why we probably should call an operations person more often. Another angle here is the organizational one: For sure there are beneficial practices for DevX and OpX—we just ignore them completely sometimes. Who owns and subsequently pays for all this data? Sometimes it feels like there are only undesirable decisions to make here.
This talk will rank at least 20 very bad practices and attempt to suggest remedies, better practices, and solutions to "make visible what happens on these computers."
