Skip to main content

Stragglers vs. Failures - Do you know the difference?

Stragglers vs. Failures - Do you know the difference?

➤ A failure is a request that doesn’t complete. ➤ A straggler is a request that completes but takes too long - often caused by a backend garbage collection (GC) pause, a hot partition, or a kernel scheduling blip.

From the caller’s perspective, both damage p99. However, they require fundamentally different architectural solutions.

Read Prathamesh Bhope’s #InfoQ article for a deeper dive: https://bit.ly/4uDWKg1

#DistributedSystems #CloudComputing #SoftwareEngineering #Performance

Preview image for Stragglers, Not Failures: How Adaptive Hedged Requests Reduce p99 Latency by 74 Percent

Stragglers, Not Failures: How Adaptive Hedged Requests Reduce p99 Latency by 74 Percent

n fan-out microservice architectures, slow-but-completing requests accumulate across services and drive p99 latency far higher than per-service metrics suggest. This article presents an adaptive hedging mechanism that uses DDSketch for real-time quantile estimation, windowed rotation to handle distribution drift, and a token-bucket budget to prevent load amplification.

bit.ly
View original 0 Likes 0 Boosts

Comments (0)

No comments yet.