Thursday, June 5, 2014

Analyzing Java Thread Dumps

Thread Dumps is an excellent mechanism to trouble shoot typical production issues. Analyzing Java Thread dumps will provide us the clear understanding of production issues. In this blog entry we will discuss about what is a thread dump, how to take the thread dump,when to take thread dump and how to analyze the thread dumps.
  
What is a thread?
A thread is a Light-Weight process. Java Threads are Light-Weight Processes of the JVM process. A WebLogic JVM process contains multiple threads

What is a thread dump?
A thread dump is a snapshot of all the Java threads that are currently running in a Java Virtual Machine (JVM) at a given point of time. A thread dump might contain single thread or multiple threads. Thread Dump will provide the stack traces of all the JVM threads and much more information about a particular thread what it is doing at that point of time.

When will you take thread dumps?
  • Java application is running much slower than expected (poor response time)
  • High CPU Utilization
  • When Server/Application is hanging (server become unresponsive or not responding).
If you want to understand what is actively executed in the system during the load, you can take the thread dump and go through it.

How will you take the thread dumps?
There are several ways to take thread dumps

(1) Using jrcmd (jrockit 1.6)


 jrcmd <pid> print_threads (or)
 jrcmd <pid> print_threads nativestack=true jvmmonitors=true


The above two commands will print out the thread dump. It is possible to redirect the above thread dump to a file as below:

jrcmd <pid> print_threads nativestack=true jvmmonitors=true > /tmp/thread_dump.txt

Here <pid> is the java process id
nativestack=true will print C-level stacktraces as well as Java traces.
jvmmonitors=true will also print the JRockit JVM's internal native locks

Example:
/u01/app/oracle/product/fmw/jrockit_160_29_D1.2.0-10/bin/jrcmd  2789 print_threads nativestack=true jvmmonitors=true > /tmp/thread_dump_2789.txt

(2) Using jstack (Sun JDK) 

jstack <pid> (get the process id using jps -l)

You can redirect the output to a file as below
jstack <pid> > threaddump.log

When Java process is hung or not responding, you can take the thread dump as below:
jstack  -F -l <pid> (or)
jstack  -F -l <pid> > threaddump.log

Options:
    -F  to force a thread dump. Use when jstack <pid> does not respond (process is hung)
    -l  long listing. Prints additional information about locks

(3) Using WebLogic Admin Console
Login to Admin Console
Click on Servers under domain tree > Click on the Server (for which you want to take thread dump) >  Monitoring > Threads  >Click on "Dump Thread Stack" Button. This will generate Thread dump in the Admin Console



(4) Using kill command in Unix/Linux
kill -3 <pid> (get the  process id using ps -ef | grep "java")

(5) Using weblogic.Admin utility (deprecated, still you can use)
You can take the thread dump for  WebLogic Admin java process using below comamnd:
 java weblogic.Admin -url t3://<Admin Server Host Name>:<Admin ServerPort> -username <weblogic admin username> --password <weblogic admin passord> THREAD_DUMP

For WebLogic managed servers you can take thread dump as below:
 java weblogic.Admin -url t3://<Managed Server Host Name>:<Managed ServerPort> -username <weblogic admin username> --password <weblogic admin passord> THREAD_DUMP

Note: The THREAD_DUMPS will be generated in the Servers STDOUT file.

Eg: 
To take thread dump of AdminServer running on AdminHost with the port 8001
java weblogic.Admin -url t3://AdminHost:8001 -username weblogic -password weblogic THREAD_DUMP

To take thread dump of managed server running MngHost on localhost with the port 8003
java weblogic.Admin -url t3://MngHost:8001 -username weblogic -password weblogic THREAD_DUMP

(6) Using WLST
In WLST threadDump command displays a thread dump for the specified server.

Syntax:
threadDump([writeToFile], [fileName], [serverName])
   
writeToFile: This argument defaults to true, indicating that output is saved to a file.
fileName: Name of the file to which the output is written. It is pptional.
serverName:This argument defaults to the server to which WLST is connected.   It is optional. If you don't specify the serverName it displays the thread dump for connected server

Example:
You can follow the below steps to take the thread dump.
$ pwd
/u01/app/oracle/product/fmw/wlserver_10.3/common/bin
$ ./wlst.sh
wls:/offline> connect('weblogic','weblogic','t3://localhost:7001')
Connecting to t3://localhost:7001 with userid weblogic ...
Successfully connected to Admin Server 'AdminServer' that belongs to domain 'test_domain'.

Warning: An insecure protocol was used to connect to the
server. To ensure on-the-wire security, the SSL port or
Admin port should be used instead.

wls:/test_domain/serverConfig> threadDump()
Thread dump for the running server: AdminServer

===== FULL THREAD DUMP ===============
Sat May 31 11:23:50 2014
Oracle JRockit(R) R28.2.4-14-151097-1.6.0_33-20120618-1634-linux-x86_64

"Main Thread" id=1 idx=0x4 tid=2487 prio=5 alive, waiting, native_blocked
    -- Waiting for notification on: weblogic/t3/srvr/T3Srvr@0xf18a00b8[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at weblogic/t3/srvr/T3Srvr.waitForDeath(T3Srvr.java:981)
    ^-- Lock released while waiting: weblogic/t3/srvr/T3Srvr@0xf18a00b8[fat lock]
    at weblogic/t3/srvr/T3Srvr.run(T3Srvr.java:490)
    at weblogic/Server.main(Server.java:71)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace
........
........
........

===== END OF THREAD DUMP ===============

The Thread Dump for server AdminServer
has been successfully written to Thread_Dump_AdminServer.txt
wls:/test_domain/serverConfig>

 $ [mastshai@OEL bin]$ ls Thread_Dump_AdminServer.txt
Thread_Dump_AdminServer.txt

You can also write a python script to generate thread dump as below:
$ cat ThreadDump.py
connect('weblogic','weblogic','t3://localhost:7001')
cd('Servers')
cd('AdminServer')
threadDump(writeToFile='true',fileName='ThreadDump.txt')
disconnect()
exit()

Now run setWLSEnv.sh script to setup the classpath & path
/u01/app/oracle/product/fmw/wlserver_10.3/server/bin
$ . ./setWLSEnv.sh
Finally run the below command to generate the thread dump using WLST

$ java weblogic.WLST  ThreadDump.py 

What are different thread states?
For analyzing thread dumps, you need to know the status of threads. The statuses of threads are
NEW: The thread is created but not available to process the requests.

RUNNABLE (ACTIVE): The thread is either currently processing the request or ready to run when it gets its CPU. It may be in WAITING status due to the OS's resource distribution. JRockit thread dumps refer to this state as ACTIVE.

BLOCKED or Locked (MW): Waiting for Monitor Entry - The thread is waiting to get the lock( a different thread may be holding the lock)

WAITING or Waiting on Monitor or CW: The thread is waiting by using a wait, join or park method.  For example, In WebLogc server the idle execute threads are in this condition and they wait till a socket reader thread notify them of some new work to be done.


What are the different types of WebLogic thread states?
Weblogic administrators need to understand different thread states and monitoring WebLogic threads.
  • Active Execute Threads
  • Idle Execute Threads
  • Standby Threads
  • Hogging Threads
  • Stuck Threads
The thread monitoring section can be accessed for Admin and each managed server Managed server under the Monitoring > Threads tab.

 


From the above you can see, the thread monitoring tab provides a complete view of each WebLogic thread along with its state. Now we can try to understand WebLogic thread states.


Active Execute Threads: The threads which are currently processing the requests or ready to process the requests(idle threads). When thread demand goes up, WebLogic will start promoting threads from Standby to Active state which will enable them to process future client requests.


Execute Thread Idle Count: This is the number of Active idle threads currently “available” to process a client request.

Standby Thread Count: This is the number of threads waiting to be marked “eligible” to process client requests. These threads are created and visible from the JVM Thread Dump but not available yet to process a client request.

Execute Thread Total Count: This is the total number of threads “created” from the Weblogic self-tuning pool and visible from the JVM Thread Dump.You can calcualte these threads as below.
 Execute Thread Total Count = Active Execute Threads + Standby Threads

Hogging Thread Count: This is the number of threads taking much more time to process the request than the average current execution time.

Stuck Threads: The threads which are taking more time to process the request than the configured stuck thread time.  When facing slowdown conditions,  the WebLogic threads transition from the Hogging state to Stuck stage, depending how long these threads remain stuck executing their current request.

In WebLogic Admin console you can configure Stuck thread related parameters for Admin and each  Managed server under the Configuration > Tuning tab.


Stuck Thread Max Time:The number of seconds that a thread must be continually working before this server considers the thread stuck. The default value is 600 seconds(10mins)

Stuck Thread Timer Interval:The number of seconds after which WebLogic Server periodically scans threads to see if they have been continually working for the configured maximum length of time. The default value is 60 seconds