Thread Dump Analysis Pattern – Traffic Jam

Description

Thread-A could have acquired the lock-1 and then would never release it.  Thread-B could have acquired lock-2 and waiting on this lock-1. Thread-C could be waiting to acquire lock-2. This kind of transitive blocks between threads can make entire application unresponsive. See the real-world example below.

Example

Below is an real-world example taken from a major travel application. Here ‘Finalizer’ thread was waiting for a lock that was held by ‘ajp-bio-192.168.100.41-7078-exec-40‘ thread. ajp-bio-192.168.100.41-7078-exec-40 and several other threads were waiting for the lock which was held by ‘ajp-bio-192.168.100.41-7078-exec-12‘ thread.

Thus ‘ajp-bio-192.168.100.41-7078-exec-12’ has transitively blocked 42 threads in total. This ripple effect caused entire application to become unresponsive. Apparently it turned out ‘ajp-bio-192.168.100.41-7078-exec-12’ was blocked indefinitely because of an bug in a APM monitoring agent (Ruxit). Upgrading the agent version to latest version resolved the problem. This is quite an irony because – APM monitoring agents are meant to prevent/isolate these sort of issues, but in this case they themselves are causing the issue. It’s like a law enforcement breaking the laws.

transitive-blocks

Fig: Graph showing transitive blocks among threads

Please refer to this document for the stack trace of the blocked threads.

Why named as Traffic Jam?

Traffic Jam typically happens when there is a accident in the front. Due to that all the cars that are following the front car will also get stranded. This is very analogous to the transitive blocks behavior described here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: