bring network and time together using Linux tracing


Alexander Aring


Nuts and Bolts

Session Type




This talk is about visualizing a distributed network protocol by using the trace-cmd [0] time synchronized tracepoints feature. As an example we use the Linux Distributed Lock Manager [1] (DLM) protocol to visualize lock states over time in the jumpshot [2] viewer.

Trace-cmd is the user space tracing utility to control the Linux in-kernel tracing subsystem. Recently a new feature was introduced to record multiple Linux machines tracing events with their timestamps synchronized across those machines.

The Linux Distributed Lock Manager (DLM) subsystem is a distributed network protocol used by Linux clusters to control mutual access to shared resources. Current DLM debugging methods are limited by dumping lock states via command line interfaces e.g. debugfs. Those dumps can only be taken sequentially and without being time synchronized. Means it will not represent all lock states at one time. Additionally those cli dumps need to be merged on your own to see a connection between them.

The slog2sdk [3] containing the viewer jumpshot will be used to represent the DLM protocol lock states over time by using a GANTT chart [4]. Therefore a trace converter dlm2slog2 [5] was developed to build a bridge between those components of trace-cmd/Linux trace subsystem and slog2sdk.

In this talk I will show what the steps are to record a time synchronized DLM trace by using trace-cmd and how those are converted to visualize them in jumpshot. This approach can be adapted to other distributed network protocols as well and is not limited for debugging use cases. Moreover we will look into possible new ideas on how to use time synchronized tracing events in a distributed network.

[0] [1] [2] [3] [4] [5]