When recurring 10–15 second outages made the database behind #LinkedIn’s user…
When recurring 10–15 second outages made the database behind #LinkedIn’s user feed briefly unavailable - with no useful logs - engineers turned to off-CPU profiling with #eBPF to find the root cause.
The failures were ephemeral, patternless, and had no clear external trigger.
Here is how they solved it ⇨ https://bit.ly/3PS1RtN
#InfoQ #SoftwareArchitecture #Profilers #Monitoring #Database
How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes
When LinkedIn engineers encountered short-lived, recurring outages where the database powering their user feed became unavailable and then recovered without leaving helpful traces, they had to devise a novel approach to uncover the root cause using off-CPU profiling with eBPF.
bit.ly
Comments (0)