THREAD DUMP ANALYSIS PATTERN – REPETITIVE STRAIN INJURY (RSI)

shutterstock_240915760

Description

Whenever significant number of threads exhibit same/repetitive stack trace then those stack trace has to be investigated. It may be indicative of performance problems.

Whenever there are any performance problems or bottlenecks in the application most of the threads will start to accumulate on the problematic area. Here are few such scenarios:

  1. Say your SOR or external service is slowing down then significant number of threads will start to wait for it’s response. In such circumstance those threads will exhibit similar stack trace.
  2. Say a thread acquired a lock & it never released then, then several other threads who are in the same execution path, will get in to the blocked state, exhibiting same stack trace.
  3. If a loop (for loop, while loop, do..while loop) condition doesn’t terminate then several threads which executes that loop which exhibit same stack trace.

When any of the above scenario occurs application’s performance and availability will be questioned.

Example

Below is an excerpt from a thread dump of a major B2B application, where 225 threads out of 400 threads were exhibiting same stack trace:

"ajp-bio-192.168.100.128-9022-exec-79" daemon prio=10 tid=0x00007f4d2001c000 nid=0x1d1c waiting for monitor entry [0x00007f4ce91fa000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.xxxxxxxxx.xx.xxx.xxxx.ReferenceNumberGeneratorDaoImpl.getNextItineryReferenceNumber(ReferenceNumberGeneratorDaoImpl.java:55)
	- waiting to lock 0x00000006afaa5a60 (a com.xxxxxxxxx.sp.dao.impl.ReferenceNumberGeneratorDaoImpl)
	at sun.reflect.GeneratedMethodAccessor3112.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
	at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:111)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
	at com.sun.proxy.$Proxy36.getNextItineryReferenceNumber(Unknown Source)
	at com.xxxxxxxxx.xx.xxxxxxxx.xxx.ReferenceNumberGeneratorServiceImpl.getNextItineryReferenceNumber(ReferenceNumberGeneratorServiceImpl.java:15)
	:
	:

From the stack trace you can infer that thread was blocked and waiting for the lock on the object0x00000006afaa5a60. 225 such threads waiting to obtain lock on this same object. It’s definitely a bad sign. It’s a clear indication of thread starvation.

Now we need to figure out who is holding on to the lock. I will grep for the string: “locked “. It shows “ajp-bio-192.168.100.128-9022-exec-84″ acquired the lock and got stuck waiting for response from database. Thus 225 other threads were stuck waiting on it.

“ajp-bio-192.168.100.128-9022-exec-84″ daemon prio=10 tid=0x00007f4d2000a800 nid=0x1d26 runnable [0x00007f4ce6ce1000]
    java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:150)
	at java.net.SocketInputStream.read(SocketInputStream.java:121)
	at java.io.DataInputStream.readFully(DataInputStream.java:195)
	at java.io.DataInputStream.readFully(DataInputStream.java:169)
	at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850)
	at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:731)
	- locked  (a java.util.concurrent.ConcurrentHashMap)
	at net.sourceforge.jtds.jdbc.RexxonseStream.getPacket(RexxonseStream.java:477)
	at net.sourceforge.jtds.jdbc.RexxonseStream.read(RexxonseStream.java:114)
	at net.sourceforge.jtds.jdbc.RexxonseStream.peek(RexxonseStream.java:99)
	at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:4127)
	at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1086)
	- locked  (a net.sourceforge.jtds.jdbc.TdsCore)
	at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:493)
	at net.sourceforge.jtds.jdbc.JtdxxreparedStatement.executeQuery(JtdxxreparedStatement.java:1032)
	at com.jolbox.bonecp.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:174)
	at com.xxxxxxxx.xx.xxx.xxx.ReferenceNumberGeneratorxxxxxx.getNextItineryReferenceNumber(ReferenceNumberGeneratorxxxxxx.java:65)
	- locked  0x00000006afaa5a60(a com.xxxxxxxx.xx.xxx.xxx.ReferenceNumberGeneratorxxxxxx)
	at sun.reflect.GeneratedMethodAccessor3112.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorxxx.invoke(DelegatingMethodAccessorxxx.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
	at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:111)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
	at com.sun.proxy.$Proxy36.getNextItineryReferenceNumber(Unknown Source)
	at com.xxxxxxxx.xx.service.xxx.ReferenceNumberGeneratorServicexxx.getNextItineryReferenceNumber(ReferenceNumberGeneratorServicexxx.java:15)
	at sun.reflect.GeneratedMethodAccessor3031.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorxxx.invoke(DelegatingMethodAccessorxxx.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
        :
        :

So bottom line is always look for repeating stack traces

Why named as RSI?

Technical definition of Repetitive Strain Injury (RSI) goes as – ‘an injury to the musculoskeletal and nervous systems that may be caused by repetitive tasks, forceful exertions, vibrations, mechanical compression, or sustained or awkward positions’. Similarly when there are performance problems or bottlenecks multiple threads will exhibit same/repetitive stack trace.

THREAD DUMP ANALYSIS PATTERN – ALL ROADS LEAD TO ROME

shutterstock_87533230

Description

If several threads in a thread dump ends up working on one single method, then it might be of concern. Most of the times, if there is a problem (say poorly responding data source, un-relinquished lock, infinite looping threads …) then significant number of threads will end up in one single method. That particular method has to be analyzed in detail.

Example

This application was connecting with Apache Cassandra NoSQL Database. Application uses DataStax java driver to connect with Cassandra. DataStax has dependency on the netty library. To be specific following are the libraries that application uses:

  • cassandra-driver-core-2.0.1.jar
  • netty-3.9.0.Final.jar

This application all of sudden ran in to ‘java.lang.OutOfMemoryError: unable to create new native thread‘. When thread dump was taken on the application, around 2460 threads were in ‘runnable’ state stuck in the method: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method). Below is the stack trace of one of the thread:

"New I/O worker #211" prio=10 tid=0x00007fa06424d000 nid=0x1a58 runnable [0x00007f9f832f6000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
	- locked  (a sun.nio.ch.Util$2)
	- locked  (a java.util.Collections$UnmodifiableSet)
	- locked  (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
	at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:722)

Oh boy! This is way too many threads. All of them are netty library threads. Apparently issue turned out that Cassandra NoSQL DB ran out of space. This issue was cascading in the application as OutOfMemoryError. When more space was allocated in Cassandra DB, problem went away.

Thus always look out for the method(s) where most of the threads are working.

Why named as ‘All roads lead to Rome’?

‘All Roads lead to Rome’ is a famous proverb to indicate different paths can take to one final end. Similarly when there is a problem there is a high chance that several threads will finally end-up in one problematic method.

THREAD DUMP ANALYSIS PATTERN – ATHEROSCLEROSIS

shutterstock_131953838

Description

If threads are blocking momentarily then it’s not a problem. However if they are blocking for a prolonged period, then it’s of concern. It’s indicative of some problem in the application. Typically blocked threads would make application unresponsive.

Threads that remain in same method and in ‘BLOCKED’ state between 3 threads dump which are captured in an interval of 10 seconds can turn out to be problematic ones. Studying the stack traces of such blocked threads would indicate the reasons why they are blocked. Reasons may include: dead locks, circular dead locks, another thread would have acquired the locked and never released it, external SORs could have become unresponsive …

Example

Following is the excerpt of a thread dump that was captured from a major SOA application, which became unresponsive. The thread ajp-bio-192.168.100.128-9022-exec-173 remained in BLOCKED state for 3 consecutive thread dumps which were captured in an interval of 10 seconds. Here goes the important parts of Stack  trace of this thread:

ajp-bio-192.168.100.128-9022-exec-173 
Stack Trace is: 
java.lang.Thread.State: BLOCKED (on object monitor)
at **.***.sp.dao.impl.ReferenceNumberGeneratorDaoImpl.getNextItineryReferenceNumber(ReferenceNumberGeneratorDaoImpl.java:55)
- waiting to lock 0x00000006afaa5a60 (a **.***.sp.dao.impl.ReferenceNumberGeneratorDaoImpl)
at sun.reflect.GeneratedMethodAccessor3112.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
:
:
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

Here you can notice that ajp-bio-192.168.100.128-9022-exec-173 stuck in the method **.***.sp.dao.impl.ReferenceNumberGeneratorDaoImpl.getNextItineryReferenceNumber(ReferenceNumberGeneratorDaoImpl.java:55). This thread got stuck in this method, because another thread ajp-bio-192.168.100.128-9022-exec-84 after obtaining the lock 0x00000006afaa5a60, it never returned back. Below goes the stack trace of ajp-bio-192.168.100.128-9022-exec-84 thread

ajp-bio-192.168.100.128-9022-exec-84 
Stack Trace is: 
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850)
at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:731)
- locked 0x00000006afa88a68 (a java.util.concurrent.ConcurrentHashMap)
at net.sourceforge.jtds.jdbc.ResponseStream.getPacket(ResponseStream.java:477)
at net.sourceforge.jtds.jdbc.ResponseStream.read(ResponseStream.java:114)
at net.sourceforge.jtds.jdbc.ResponseStream.peek(ResponseStream.java:99)
at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:4127)
at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1086)
- locked 0x00000006bca709f8 (a net.sourceforge.jtds.jdbc.TdsCore)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:493)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:1032)
at com.jolbox.bonecp.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:174)
at **.***.sp.dao.impl.ReferenceNumberGeneratorDaoImpl.getNextItineryReferenceNumber(ReferenceNumberGeneratorDaoImpl.java:65)
- locked 0x00000006afaa5a60 (a **.***.sp.dao.impl.ReferenceNumberGeneratorDaoImpl)
at sun.reflect.GeneratedMethodAccessor3112.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
:
:
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

Apparently it turned out that JDBC read time out was never set, due to that ajp-bio-192.168.100.128-9022-exec-84 never returned back from it’s JDBC call. Thus it blocked all other threads which were invoking **.***.sp.dao.impl.ReferenceNumberGeneratorDaoImpl.getNextItineryReferenceNumber() method.

Why named as Atherosclerosis?

Atherosclerosis is a heart disease. Medically it’s defined as the following: the inside walls of human arteries are normally smooth and flexible, allowing blood to flow through them easily. Fatty deposits, or plaques, may build up inside the arterial wall. These plaques narrow the artery and can reduce or even completely stop the flow of blood, leading to death.

Similarly if blocking of a thread prolongs and happens across multiple threads, then it would make the application unresponsive, eventually it has to be killed.

Create a free website or blog at WordPress.com.

Up ↑