JBoss Shunning, Unloading class sun.reflect

I had this interesting problem with a major shipping/logistics company. Their application was running on Java 6, JBoss Cluster in RedHat Linux platform.

shutterstock_shipping

PROBLEM STATEMENT

Frequently JBoss instances were dropping off from the JBoss cluster. In JBoss terminology it’s “shunning” (good word :-). After several seconds of “shunning”, JBoss instance automatically rejoins the cluster without any manual intervention. (Note: Here JBoss didn’t crash or anything, it just stops to respond). Any user sessions that were established to this JBoss wouldn’t progress further. So user has to sign out and login again (so that session can be established with another JBoss intance in the cluster). Since problem started to happen quite frequently, users has to sign out and sign in again multiple times. This was an annoying behaviour and started to get high visibility with in the organization.

Basically below was the (shunning) error message reported in the log file:

"01:01:20,524 WARN [XXX] I (xx.xxx.x..57:7600) am not a member of view [xx.xxx.x..62:7600|20] [xx.xxx.x..62:7600, xx.xxx.x..63:7600, xx.xxx.x..60:7600, xx.xxx.x..56:7600, xx.xxx.x..55:7600], shunning myself and leaving the group (prev_members are [xx.xxx.x..55:7600, xx.xxx.x..56:7600, xx.xxx.x..60:7600, xx.xxx.x..62:7600, xx.xxx.x..63:7600, xx.xxx.x..57:7600], current view is [xx.xxx.x..57:7600|19] [xx.xxx.x..57:7600, xx.xxx.x..62:7600, xx.xxx.x..63:7600, xx.xxx.x..60:7600, xx.xxx.x..56:7600, xx.xxx.x..55:7600])

ROOT CAUSE

When I started to review the Garbage Collection log file, I was able to observe that JBoss instances were suffering from long pauses when full GC (Garbage Collection) runs. (Note: Explaining the details of Full GC is outside the scope of this article). However point to note here is: When Full GC runs, entire JVM would freeze. JVM wouldn’t be able to process any new transaction and any transactions which are in flight would starve.

Below is the Garbage Collection log file excerpts, which indicates full GC duration time (refer to bold font):

2015-02-03T12:54:37.658-0800: 30910.469: [Full GC [PSYoungGen: 8437K->0K(1032320K)] [PSOldGen: 2089266K->1256164K(2097152K)] 2097703K->1256164K(3129472K) [PSPermGen: 435344K->435344K(435392K)], 77.0011120 secs] [Times: user=8.67 sys=2.33, real=76.98 secs]
2015-02-03T23:49:16.473-0800: 70189.284: [Full GC [PSYoungGen: 32485K->0K(1005696K)] [PSOldGen: 2094611K->1050741K(2097152K)] 2127096K->1050741K(3102848K) [PSPermGen: 475378K->470743K(524288K)], 130.8265940 secs] [Times: user=7.50 sys=2.59, real=130.80 secs]
2015-02-03T13:17:36.140-0800: 30896.682: [Full GC [PSYoungGen: 11169K->0K(1033728K)] [PSOldGen: 2092931K->1189135K(2097152K)] 2104100K->1189135K(3130880K) [PSPermGen: 434802K->434802K(434816K)], 83.8425200 secs] [Times: user=8.06 sys=1.87, real=83.83 secs]
2015-02-04T01:00:01.573-0800: 72878.901: [Full GC [PSYoungGen: 12800K->0K(1017344K)] [PSOldGen: 2096205K->1036140K(2097152K)] 2109005K->1036140K(3114496K) [PSPermGen: 515271K->468966K(524288K)], 78.9353300 secs] [Times: user=6.51 sys=0.88, real=78.92 secs]

As shown above, sometimes (not always) JBoss full GC duration is taking anywhere between 80secs – 130secs. Since JVM becomes dormant during this period, JBOSS cluster evicts (aka shuns) the JBoss instance, since it’s not responding to heart beat checks. Thus all the customer’s transactions which are handled by this Jboss instance gets jeopardized.

Whenever a Jboss instance is evicted from the cluster, an alert notification is sent out to operations team. Whenever Full GC duration exceed several seconds, those are the exact times at which alert notifications are sent out. It was clear confirmation that Long Pauses caued by GC was root cause for this problem.

SOLUTION

By studying the garbage collection log file, it was evident that Permenant Generation space in the JVM memory was fully getting utilized. Besides that in the application log file, following statements were printed. It’s indication that GC process is trying to remove the class files from the Permenant Generation. It’s neither a good sign.

[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1829]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1667]
[Unloading class sun.reflect.GeneratedMethodAccessor24987]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1365]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1607]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1506]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1346]

So I recommended to bump-up their Permanent Generation space (-XX:MaxPermSize). After this fix, application started to function normally.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

Up ↑

%d bloggers like this: