In the earlier blog you saw how to generate thread dump. This blog walks you through on how to analyze thread dumps.
It’s a recommended practice to capture two thread dumps from the application in a gap of 10 seconds between each capture. As Thread dump is a snap shot of all threads at the current moment, it’s always advantageous to know what those thread are doing few seconds after the initial capture.
http://fastthread.io/ – as I am the author of this tool, we might be showing nepotism towards this tool.
It’s important to understand various states of a thread and meaning before looking in to thread dump.
A thread can be in one of the following states:
A thread that has not yet started is in this state.
A thread executing in the Java virtual machine is in this state.
A thread that is blocked waiting for a monitor lock is in this state.
A thread that is waiting indefinitely for another thread to perform a particular action is in this state.
A thread that is waiting for another thread to perform an action for up to a specified waiting time is in this state.
A thread that has exited is in this state.
Thread States – State Diagram
Following is a great diagram posted in http://www.uml-diagrams.org. It very well illustrates what API calls would cause the threads to move in to what states.
How to read Thread Dump
As one diagram equals thousand words, below goes thousand words. Diagram is an excerpt from the thread dump. Red color highlights indicates what each section in the thread dump mean, Click on the diagram to enlarge and see.
Following are the key things that I would be looking in a thread dump:
1. Thread count
Total number of thread count that are reported in the Thread dump. If there are too many threads, then expected count then it’s of concern. Too many threads cab result in ‘java.lang.OutOfMemoryError:unable to create new native thread’.
2. Thread State – RUNNABLE
If application’s CPU consumption is quite high then I would grep for the string: “java.lang.Thread.State: RUNNABLE”. As it would show all the threads that are executing (i.e. running) when thread dump was captured. Note: too many threads in RUNNABLE state will cause surge in CPU. Then each thread’s stack trace which are in RUNNABLE state will have to be studied to see how it can be optimized.
3. Repeating Stack Traces
I will look for repeating stack traces among the threads. If same stack trace appears across multiple threads that it’s of concern. http://spotify.github.io/threaddump-analyzer/ is a good tool, which groups together threads with same stack trace in a thread dump.
Below is an excerpt from the sample thread dump, where 225 threads were having similar stack trace:
"ajp-bio-192.168.100.128-9022-exec-79" daemon prio=10 tid=0x00007f4d2001c000 nid=0x1d1c waiting for monitor entry [0x00007f4ce91fa000] java.lang.Thread.State: BLOCKED (on object monitor) at com.xxxxxxxxx.xx.xxx.xxxx.ReferenceNumberGeneratorDaoImpl.getNextItineryReferenceNumber(ReferenceNumberGeneratorDaoImpl.java:55) - waiting to lock (a com.xxxxxxxxx.sp.dao.impl.ReferenceNumberGeneratorDaoImpl) at sun.reflect.GeneratedMethodAccessor3112.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:111) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at com.sun.proxy.$Proxy36.getNextItineryReferenceNumber(Unknown Source) at com.xxxxxxxxx.xx.xxxxxxxx.xxx.ReferenceNumberGeneratorServiceImpl.getNextItineryReferenceNumber(ReferenceNumberGeneratorServiceImpl.java:15) : :
From the stack trace you can infer that thread was blocked and waiting for the lock on the object 0x00000006afaa5a60. 225 such threads waiting to obtain lock on this same object. It’s definitely a bad sign. It’s a clear indication of thread starvation.
Now we need to figure out who is holding on to the lock. I will grep for the string: “locked “. It shows “ajp-bio-192.168.100.128-9022-exec-84” acquired the lock and got stuck waiting for response from database. Thus 225 other threads were stuck waiting on it.
“ajp-bio-192.168.100.128-9022-exec-84” daemon prio=10 tid=0x00007f4d2000a800 nid=0x1d26 runnable [0x00007f4ce6ce1000]
java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readFully(DataInputStream.java:169) at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850) at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:731) - locked (a java.util.concurrent.ConcurrentHashMap) at net.sourceforge.jtds.jdbc.RexxonseStream.getPacket(RexxonseStream.java:477) at net.sourceforge.jtds.jdbc.RexxonseStream.read(RexxonseStream.java:114) at net.sourceforge.jtds.jdbc.RexxonseStream.peek(RexxonseStream.java:99) at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:4127) at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1086) - locked (a net.sourceforge.jtds.jdbc.TdsCore) at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:493) at net.sourceforge.jtds.jdbc.JtdxxreparedStatement.executeQuery(JtdxxreparedStatement.java:1032) at com.jolbox.bonecp.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:174) at com.xxxxxxxx.xx.xxx.xxx.ReferenceNumberGeneratorxxxxxx.getNextItineryReferenceNumber(ReferenceNumberGeneratorxxxxxx.java:65) - locked (a com.xxxxxxxx.xx.xxx.xxx.ReferenceNumberGeneratorxxxxxx) at sun.reflect.GeneratedMethodAccessor3112.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorxxx.invoke(DelegatingMethodAccessorxxx.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:111) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at com.sun.proxy.$Proxy36.getNextItineryReferenceNumber(Unknown Source) at com.xxxxxxxx.xx.service.xxx.ReferenceNumberGeneratorServicexxx.getNextItineryReferenceNumber(ReferenceNumberGeneratorServicexxx.java:15) at sun.reflect.GeneratedMethodAccessor3031.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorxxx.invoke(DelegatingMethodAccessorxxx.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) : :
So bottom line is always look for repeating stack traces
Modern thread dumps have become smarter. If application experiences any dead locks, you are going to see the message “Found deadlock” reported in the thread dump. Also the threads which are causing deadlock will also get reported. Here is a sample:
Found one Java-level deadlock: ============================= "Thread-1": waiting to lock monitor 0x00000000071a3f20 (object 0x00000007ac3ac2d0, a java.lang.Object), which is held by "Thread-0" "Thread-0": waiting to lock monitor 0x00000000071a2b70 (object 0x00000007ac3ac2e0, a java.lang.Object), which is held by "Thread-1" Java stack information for the threads listed above: =================================================== "Thread-1": at com.tier1app.DeadLockSimulator$SecondThread.run(DeadLockSimulator.java:29) - waiting to lock (a java.lang.Object) - locked (a java.lang.Object) "Thread-0": at com.tier1app.DeadLockSimulator$FirstThread.run(DeadLockSimulator.java:16) - waiting to lock (a java.lang.Object) - locked (a java.lang.Object) Found 1 deadlock.
5. Infinite Looping
It’s highly recommended to capture two thread dumps in an interval of 10 seconds. Please refer to ‘Best Practice’ section in this document.
I will compare each thread in RUNNABLE state in the first thread dump with each thread in RUNNABLE state in the second thread dump. If any thread is in RUNNABLE state in the both the thread dump then it’s a YELLOW flag (i.e. warning). Then I will start to compare the stack trace between. if they happen to be working on the same method, then it’s confirmed problem. It indicates that particular thread is looping infinitely on the same method. Even if one single thread loops infinitely then it would cause spike in the CPU utilization.
6. Unsafe HashMap access
When multiple threads tries to access HashMap’s get() and put() APIs concurrently it would cause threads go in to infinite looping. This problem doesn’t happen always, but rarely it does happens. Here is a detailed blog which describes about this problem.
If in the thread dump, you see HashMap’s get() or put() API, then it’s indicative that HashMap is causing CPU spike. To fix the problem replace the HashMap with ConcurrentHashMap.
Below is an excerpt from the thread dump which indicates the infinite looping that is happening in HashMap:
"Thread-0";quot; prio=6 tid=0x000000000b583000 nid=0x10adc runnable [0x000000000cb6f000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.put(HashMap.java:374) at com.tier1app.HashMapLooper$AddForeverThread.AddForever(NonTerminatingLooper.java:32) at com.tier1app.HashMapLooper$AddForeverThread.method2(NonTerminatingLooper.java:27) at com.tier1app.HashMapLooper$AddForeverThread.method1(NonTerminatingLooper.java:22) at com.tier1app.NonTerminatingLooper$LoopForeverThread.run(NonTerminatingLooper.java:16)