0
点赞
收藏
分享

微信扫一扫

Java程序栈信息文件中的秘密


 

最近一段时间比较辛苦,产品上线之后问题不断,最近更出现了多个阻塞问题,全靠分析进程栈文件的内容来辅助定位问题。平心而论,做Java开发比较省心,平时开发过程中可能不像在Linux下做C/C++开发的兄弟一样,经常需要分析程序栈和Core文件,而且Java开发过程中可用的调试工具非常多,功能强大,而且调试时也非常方便,但是,分析程序栈仍然是不可或缺的、重要的方法,尤其在产品上线之后,用户出于各种原因可能不乐意软件开发人员部署各种各样的工具,这时,分析程序栈就成了唯一可用并且好用的方法。

首先,Java应用程序的栈如何获取?JDK提供了非常多的工具来完成这个需求,比如GUI界面的jconsole、jvisualvm,命令行下的jstack。GUI的工具比较简单,这里主要讲jstack,比较基础,特别适合在服务器上使用。如下是jstack工具的使用帮助。

 

 

1. jstack -l  
2. Usage:  
3.     jstack [-l] <pid>  
4.         (to connect to running process)  
5.     jstack -F [-m] [-l] <pid>  
6.         (to connect to a hung process)  
7.     jstack [-m] [-l] <executable> <core>  
8.         (to connect to a core file)  
9.     jstack [-m] [-l] [server_id@]<remote server IP or hostname>  
10.         (to connect to a remote debug server)  
11.   
12. Options:  
13.     -F  to force a thread dump. Use when jstack <pid> does not respond (process  
14. is hung)  
15.     -m  to print both java and native frames (mixed mode)  
16.     -l  long listing. Prints additional information about locks  
17.     -h or -help to print this help message


-l,一般用于附加到运行状态的Java应用,除导出栈信息外,还会记录各个线程当前持有或者等待的锁信息。

 

-m,附加到运行状态的Java应用,可以同时导出java语言和本地栈的信息。

-F,对于-l和-m选项无法导出栈信息时,就只好使用-F选项,强制导出应用的栈信息;照我的理解,这是最后的选择。

jstack强大之处在于除了可以导出在本地运行的Java应用的栈信息,还可以导出运行于远程机器上的Java应用的栈信息,不过这个特性没有怎么用过,需要什么配置还不清楚,这里就不展开了。
下面对一段实例解析,比如对日常开发中常用到的Eclipse导出栈信息,如下。

 



 

1. 2013-01-13 11:02:31  
2. Full thread dump Java HotSpot(TM) Client VM (23.1-b03 mixed mode, sharing):  
3.   
4. "[ThreadPool Manager] - Idle Thread" daemon prio=6 tid=0x069a3400 nid=0x84 in Object.wait() [0x0795f000]  
5.    java.lang.Thread.State: WAITING (on object monitor)  
6.     at java.lang.Object.wait(Native Method)  
7.     - waiting on <0x239102b8> (a org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor)  
8.     at java.lang.Object.wait(Object.java:503)  
9.     at org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor.run(Executor.java:106)  
10.     - locked <0x239102b8> (a org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor)  
11.   
12.    Locked ownable synchronizers:  
13.     - None  
14.   
15. "Worker-2" prio=6 tid=0x032e5400 nid=0xfc in Object.wait() [0x0655f000]  
16.    java.lang.Thread.State: TIMED_WAITING (on object monitor)  
17.     at java.lang.Object.wait(Native Method)  
18.     - waiting on <0x27fd3118> (a org.eclipse.core.internal.jobs.WorkerPool)  
19.     at org.eclipse.core.internal.jobs.WorkerPool.sleep(WorkerPool.java:185)  
20.     - locked <0x27fd3118> (a org.eclipse.core.internal.jobs.WorkerPool)  
21.     at org.eclipse.core.internal.jobs.WorkerPool.startJob(WorkerPool.java:217)  
22.     at org.eclipse.core.internal.jobs.Worker.run(Worker.java:51)  
23.   
24.    Locked ownable synchronizers:  
25.     - None  
26.   
27. "Worker-1" prio=6 tid=0x052dd400 nid=0xd88 in Object.wait() [0x0645f000]  
28.    java.lang.Thread.State: TIMED_WAITING (on object monitor)  
29.     at java.lang.Object.wait(Native Method)  
30.     - waiting on <0x27fd3118> (a org.eclipse.core.internal.jobs.WorkerPool)  
31.     at org.eclipse.core.internal.jobs.WorkerPool.sleep(WorkerPool.java:185)  
32.     - locked <0x27fd3118> (a org.eclipse.core.internal.jobs.WorkerPool)  
33.     at org.eclipse.core.internal.jobs.WorkerPool.startJob(WorkerPool.java:217)  
34.     at org.eclipse.core.internal.jobs.Worker.run(Worker.java:51)  
35.   
36.    Locked ownable synchronizers:  
37.     - None  
38.   
39. "Bundle File Closer" daemon prio=6 tid=0x052d0800 nid=0xed8 in Object.wait() [0x04aef000]  
40.    java.lang.Thread.State: WAITING (on object monitor)  
41.     at java.lang.Object.wait(Native Method)  
42.     - waiting on <0x294bc768> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread)  
43.     at java.lang.Object.wait(Object.java:503)  
44.     at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.getNextEvent(EventManager.java:397)  
45.     - locked <0x294bc768> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread)  
46.     at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:333)  
47.   
48.    Locked ownable synchronizers:  
49.     - None  
50.   
51. "Java indexing" daemon prio=4 tid=0x04ddd400 nid=0x1b0 in Object.wait() [0x05e5f000]  
52.    java.lang.Thread.State: WAITING (on object monitor)  
53.     at java.lang.Object.wait(Native Method)  
54.     - waiting on <0x289f02e8> (a org.eclipse.jdt.internal.core.search.indexing.IndexManager)  
55.     at java.lang.Object.wait(Object.java:503)  
56.     at org.eclipse.jdt.internal.core.search.processing.JobManager.run(JobManager.java:381)  
57.     - locked <0x289f02e8> (a org.eclipse.jdt.internal.core.search.indexing.IndexManager)  
58.     at java.lang.Thread.run(Unknown Source)  
59.   
60.    Locked ownable synchronizers:  
61.     - None  
62.   
63. "[Timer] - Main Queue Handler" daemon prio=6 tid=0x0398d800 nid=0x824 in Object.wait() [0x0416f000]  
64.    java.lang.Thread.State: TIMED_WAITING (on object monitor)  
65.     at java.lang.Object.wait(Native Method)  
66.     - waiting on <0x28105b68> (a java.lang.Object)  
67.     at org.eclipse.equinox.internal.util.impl.tpt.timer.TimerImpl.run(TimerImpl.java:141)  
68.     - locked <0x28105b68> (a java.lang.Object)  
69.     at java.lang.Thread.run(Unknown Source)  
70.   
71.    Locked ownable synchronizers:  
72.     - None  
73.   
74. "Framework Event Dispatcher" daemon prio=6 tid=0x02c4c400 nid=0x56c in Object.wait() [0x03f6f000]  
75.    java.lang.Thread.State: WAITING (on object monitor)  
76.     at java.lang.Object.wait(Native Method)  
77.     - waiting on <0x27fd3140> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread)  
78.     at java.lang.Object.wait(Object.java:503)  
79.     at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.getNextEvent(EventManager.java:397)  
80.     - locked <0x27fd3140> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread)  
81.     at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:333)  
82.   
83.    Locked ownable synchronizers:  
84.     - None  
85.   
86. "Start Level Event Dispatcher" daemon prio=6 tid=0x03178400 nid=0xd50 in Object.wait() [0x03e6f000]  
87.    java.lang.Thread.State: WAITING (on object monitor)  
88.     at java.lang.Object.wait(Native Method)  
89.     - waiting on <0x27fd31b0> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread)  
90.     at java.lang.Object.wait(Object.java:503)  
91.     at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.getNextEvent(EventManager.java:397)  
92.     - locked <0x27fd31b0> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread)  
93.     at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:333)  
94.   
95.    Locked ownable synchronizers:  
96.     - None  
97.   
98. "State Data Manager" daemon prio=6 tid=0x02c4f400 nid=0x908 waiting on condition [0x03d6f000]  
99.    java.lang.Thread.State: TIMED_WAITING (sleeping)  
100.     at java.lang.Thread.sleep(Native Method)  
101.     at org.eclipse.osgi.internal.baseadaptor.StateManager.run(StateManager.java:306)  
102.     at java.lang.Thread.run(Unknown Source)  
103.   
104.    Locked ownable synchronizers:  
105.     - None  
106.   
107. "Service Thread" daemon prio=6 tid=0x02c45c00 nid=0x1ac runnable [0x00000000]  
108.    java.lang.Thread.State: RUNNABLE  
109.   
110.    Locked ownable synchronizers:  
111.     - None  
112.   
113. "C1 CompilerThread0" daemon prio=10 tid=0x02c43000 nid=0xd68 waiting on condition [0x00000000]  
114.    java.lang.Thread.State: RUNNABLE  
115.   
116.    Locked ownable synchronizers:  
117.     - None  
118.   
119. "Attach Listener" daemon prio=10 tid=0x02c41800 nid=0xf64 waiting on condition [0x00000000]  
120.    java.lang.Thread.State: RUNNABLE  
121.   
122.    Locked ownable synchronizers:  
123.     - None  
124.   
125. "Signal Dispatcher" daemon prio=10 tid=0x02c40000 nid=0x8ec runnable [0x00000000]  
126.    java.lang.Thread.State: RUNNABLE  
127.   
128.    Locked ownable synchronizers:  
129.     - None  
130.   
131. "Finalizer" daemon prio=8 tid=0x02c25000 nid=0xeb0 in Object.wait() [0x030ef000]  
132.    java.lang.Thread.State: WAITING (on object monitor)  
133.     at java.lang.Object.wait(Native Method)  
134.     - waiting on <0x27eaf9c8> (a java.lang.ref.ReferenceQueue$Lock)  
135.     at java.lang.ref.ReferenceQueue.remove(Unknown Source)  
136.     - locked <0x27eaf9c8> (a java.lang.ref.ReferenceQueue$Lock)  
137.     at java.lang.ref.ReferenceQueue.remove(Unknown Source)  
138.     at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)  
139.   
140.    Locked ownable synchronizers:  
141.     - None  
142.   
143. "Reference Handler" daemon prio=10 tid=0x02c20000 nid=0xe90 in Object.wait() [0x02fef000]  
144.    java.lang.Thread.State: WAITING (on object monitor)  
145.     at java.lang.Object.wait(Native Method)  
146.     - waiting on <0x27eaf680> (a java.lang.ref.Reference$Lock)  
147.     at java.lang.Object.wait(Object.java:503)  
148.     at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)  
149.     - locked <0x27eaf680> (a java.lang.ref.Reference$Lock)  
150.   
151.    Locked ownable synchronizers:  
152.     - None  
153.   
154. "main" prio=6 tid=0x00ad8400 nid=0xe4c runnable [0x0012f000]  
155.    java.lang.Thread.State: RUNNABLE  
156.     at org.eclipse.swt.internal.win32.OS.WaitMessage(Native Method)  
157.     at org.eclipse.swt.widgets.Display.sleep(Display.java:4311)  
158.     at org.eclipse.ui.application.WorkbenchAdvisor.eventLoopIdle(WorkbenchAdvisor.java:364)  
159.     at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2406)  
160.     at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369)  
161.     at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221)  
162.     at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500)  
163.     at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)  
164.     at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:493)  
165.     at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)  
166.     at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:113)  
167.     at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:194)  
168.     at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)  
169.     at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)  
170.     at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:368)  
171.     at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)  
172.     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
173.     at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)  
174.     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)  
175.     at java.lang.reflect.Method.invoke(Unknown Source)  
176.     at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559)  
177.     at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514)  
178.     at org.eclipse.equinox.launcher.Main.run(Main.java:1311)  
179.   
180.    Locked ownable synchronizers:  
181.     - None  
182.   
183. "VM Thread" prio=10 tid=0x02c1e800 nid=0x788 runnable   
184.   
185. "VM Periodic Task Thread" prio=10 tid=0x02c53800 nid=0xfbc waiting on condition   
186.   
187. JNI global references: 272


Eclipse是用Java开发的,所以导出的栈信息从格式上应当是具有普适性的。从上述样例中,首先可以看出这段栈信息是在2013-01-13 11:02:31导出的,时间精确到了秒;另外还给出了Eclipse运行时使用的JRE的版本和类型。JRE分为桌面版和服务器版,Client标识当前应用使用了桌面版JRE,针对桌面应用进行了特别的优化,其它方面的区别可以Google一下,差别比较大,这里不赘述。java应用启动时,可以通过命令行选项控制对字节码的执行方式为纯解释、解释加编译(混合,mixed mode)、纯编译,样例中使用了混合模式,JRE在运行时会对热点代码进行重点优化,翻译成机器代码运行,至于何为热点,可以继续Google,JVM有一套复杂的算法来甄别。

 

 



1. "[ThreadPool Manager] - Idle Thread" daemon prio=6 tid=0x069a3400 nid=0x84 in Object.wait() [0x0795f000]  
2.    java.lang.Thread.State: WAITING (on object monitor)  
3.     at java.lang.Object.wait(Native Method)  
4.     - waiting on <0x239102b8> (a org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor)  
5.     at java.lang.Object.wait(Object.java:503)  
6.     at org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor.run(Executor.java:106)  
7.     - locked <0x239102b8> (a org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor)  
8.   
9.    Locked ownable synchronizers:  
10.     - None


这是从样例中截取出来的一个线程的栈信息,从中可以得知如下的信息:

 

1、这个线程的名称为[ThreadPool Manager] - Idle Thread,实际开发过程中使用可以使用Thread对象的setName方法修改;这里可以看出设置线程名字属性的必要性,导出栈信息的时候就有用了,可以通过线程名字方便的定位到问题代码块;

2、优先级为6,不过优先级意义不大,对分析应用帮助不大,Java的教程中也不推荐修改线程的优先级,所以这里跳过;

3、tid和nid,16进制表示,这两个字段中有一个可以对应到操作系统调度时使用的线程ID,这在Linux下特别有用,因为top -H命令可以输出各个线程的CPU占用率,借助栈信息,可以方便的定位到CPU占用高的线程,这个很有用,在其它平台暂时还看不出什么意义;

4、当前栈顶,为java.lang.Object.wait方法;

5、线程状态,java.lang.Thread.State: WAITING (on object monitor);java.lang.Thread.State是什么?有兴趣的话可以翻看JDK的源码,源码很简单,注释很详细,给出了Java线程可能存在的状态;

6、当前线程是否处于同步块内,   Locked ownable synchronizers:,这里值为None,说明当前线程没有处于同步块内。

限于我自己对于Java栈的理解,这里对于单个线程栈的解读比较粗,不过说来应当是够用了。

上面写了Java程序栈的获取方法和格式,那么有一个问题,为什么要获取栈信息、解析这个栈信息有什么用呢?这是一个好问题,毕竟无利不起早,程序员是世界上最聪明的人类群体,没有时间花费在没有产出的事情。虽然做Java开发相比较C/C++要容易一些,但在分析死锁、CPU高的问题时,分析的手法和C/C++程序员差不多,基本都要通过栈调用来分析问题。

通过栈信息分析CPU高的方法上面已经讲过,这里简单说明一下,借助Linux下的top -H命令,输出全部线程的CPU、内存信息,找到占用比较高的线程ID,换算成16进制,然后在栈信息文件中搜索,检查是否有匹配行,如果找到并且位置在nid=或者tid=的后面,说明事发线程找到了。

通过栈信息分析死锁问题的方法相对要麻烦一点,需要一点准备知识,对Java语言的线程可能处于的状态有一定了解,Java语言教程里的讲解比较抽象,这里先来点干货,直接上点JDK中的源码。引用JDK的源码希望不会招致JDK开发人员的不满,这里仅仅是为了叙述方便。

 


 

1. public enum State {  
2. /**
3.      * Thread state for a thread which has not yet started.
4.      */  
5.     NEW,  
6.   
7. /**
8.      * Thread state for a runnable thread.  A thread in the runnable
9.      * state is executing in the Java virtual machine but it may
10.      * be waiting for other resources from the operating system
11.      * such as processor.
12.      */  
13.     RUNNABLE,  
14.   
15. /**
16.      * Thread state for a thread blocked waiting for a monitor lock.
17.      * A thread in the blocked state is waiting for a monitor lock
18.      * to enter a synchronized block/method or
19.      * reenter a synchronized block/method after calling
20.      * {@link Object#wait() Object.wait}.
21.      */  
22.     BLOCKED,  
23.   
24. /**
25.      * Thread state for a waiting thread.
26.      * A thread is in the waiting state due to calling one of the
27.      * following methods:
28.      * <ul>
29.      *   <li>{@link Object#wait() Object.wait} with no timeout</li>
30.      *   <li>{@link #join() Thread.join} with no timeout</li>
31.      *   <li>{@link LockSupport#park() LockSupport.park}</li>
32.      * </ul>
33.      *
34.      * <p>A thread in the waiting state is waiting for another thread to
35.      * perform a particular action.
36.      *
37.      * For example, a thread that has called <tt>Object.wait()</tt>
38.      * on an object is waiting for another thread to call
39.      * <tt>Object.notify()</tt> or <tt>Object.notifyAll()</tt> on
40.      * that object. A thread that has called <tt>Thread.join()</tt>
41.      * is waiting for a specified thread to terminate.
42.      */  
43.     WAITING,  
44.   
45. /**
46.      * Thread state for a waiting thread with a specified waiting time.
47.      * A thread is in the timed waiting state due to calling one of
48.      * the following methods with a specified positive waiting time:
49.      * <ul>
50.      *   <li>{@link #sleep Thread.sleep}</li>
51.      *   <li>{@link Object#wait(long) Object.wait} with timeout</li>
52.      *   <li>{@link #join(long) Thread.join} with timeout</li>
53.      *   <li>{@link LockSupport#parkNanos LockSupport.parkNanos}</li>
54.      *   <li>{@link LockSupport#parkUntil LockSupport.parkUntil}</li>
55.      * </ul>
56.      */  
57.     TIMED_WAITING,  
58.   
59. /**
60.      * Thread state for a terminated thread.
61.      * The thread has completed execution.
62.      */  
63.     TERMINATED;  
64. }


对于有一定多线程开发经验的人来说,上面的注释理解起来应当不是问题。

 

1、什么状态下是NEW,线程对象被new出来,但是还没有调用start方法,这时就被称为NEW;

2、RUNNABLE,线程对象已经被调用了start方法,但是,这个线程的run方法可能运行也可能没有运行,依赖于操作系统的行为;

3、BLOCKED,简单的说,线程进入了synchronized关键字标识的同步块,但和4状态有所区别;

4、WAITING和TIMED_WAITING差不多,一般是调用了对象的wait方法,需要其它线程在特定场景使用notify/notifyAll方法;

5、TERMINATED,这个状态比较直白,run方法退出运行,即进入这个状态;

 

栈信息本身并不神秘,格式和内容都比较单一,但借助程序的调用栈信息可以观察到很多有用的信息。借助栈信息来分析有时候被认为是一种秘技,实际上这是一种误解。古人云,读书百遍,其义自见,程序调用栈信息看多了自然就会明白。下面结合java.lang.Thread.State类的注释,举几个代码及其调用栈的例子,相信对于聪明的同学,入门就不困难了。但原理什么的,就得靠自己了,至少得读一下java.lang.Thread.State类的注释吧。

下面的例子都是使用Oracle公司出品的Java7编译和测试的,具体的版本信息如下。

 



 


1. java -version  
2. java version "1.7.0_11"  
3. Java(TM) SE Runtime Environment (build 1.7.0_11-b21)  
4. Java HotSpot(TM) Client VM (build 23.6-b04, mixed mode, sharing)



例子一

 


 

1. public class DeadLock {  
2. public static void main(final String[] args) throws Exception {  
3. final Object obj = new Object();  
4. new Thread() {  
5. @Override  
6. public void run() {  
7.                 callWait(obj);  
8.             };  
9.         }.start();  
10.         Thread.sleep(3000L);  
11. synchronized (obj) {  
12. try {  
13.                 obj.wait();  
14.             }  
15. catch (final InterruptedException e) {  
16.                 e.printStackTrace();  
17.             }  
18.         }  
19.     }  
20.   
21. static void callWait(final Object obj) {  
22. synchronized (obj) {  
23. try {  
24.                 obj.wait();  
25.             }  
26. catch (final InterruptedException e) {  
27.                 e.printStackTrace();  
28.             }  
29.         }  
30.     }  
31. }



这段代码并不复杂,做的事情比较明确,一共存在有两个线程,都会调用callWait方法,在callWait方法中又会对一个对象加锁,调用对应的wait方法。这个例子模拟的错误就是wait方法和notify/notifyAll方法没有成对的调用,导致程序出现死锁现象。问题说到这里,解决的方法自然非常明确,但是重点在根据程序的调用栈来发现问题,因而需要实地执行程序,同时使用jstack工具获取栈信息,了解从栈信息中如何发现问题的根本原因。

 

 


1. "main" prio=6 tid=0x00868800 nid=0x1370 in Object.wait() [0x0098f000]  
2.    java.lang.Thread.State: WAITING (on object monitor)  
3.     at java.lang.Object.wait(Native Method)  
4.     - waiting on <0x22bc2c08> (a java.lang.Object)  
5.     at java.lang.Object.wait(Object.java:503)  
6.     at DeadLock.main(DeadLock.java:30)  
7.     - locked <0x22bc2c08> (a java.lang.Object)  
8.   
9.    Locked ownable synchronizers:  
10.     - None  
11.   
12. "Thread-0" prio=6 tid=0x02b22000 nid=0x1044 in Object.wait() [0x0316f000]  
13.    java.lang.Thread.State: WAITING (on object monitor)  
14.     at java.lang.Object.wait(Native Method)  
15.     - waiting on <0x22bc2c08> (a java.lang.Object)  
16.     at java.lang.Object.wait(Object.java:503)  
17.     at DeadLock$1.run(DeadLock.java:14)  
18.     - locked <0x22bc2c08> (a java.lang.Object)  
19.   
20.    Locked ownable synchronizers:  
21.     - None



上面只是栈文件中的一部分,两个线程的栈信息块之间可能也没有离这么近,为了缩短篇幅,做了一定的裁剪,去掉了问题不相关的部分。可以发现0x22bc2c08一共出现了四次,两段栈信息中各出现再次,顺序一样,都是调用了wait方法后进入了WAITING状态,同时都没有持有对象的锁。有趣的现象哈。

例子二

 



    1. public class DeadLock {  
    2. public static void main(final String[] args) throws Exception {  
    3. "main thread");  
    4. final Thread r = new Thread() {  
    5. @Override  
    6. public void run() {  
    7. try {  
    8.                     Thread.sleep(200000L);  
    9.                 }  
    10. catch (final InterruptedException e) {  
    11.                     e.printStackTrace();  
    12.                 }  
    13.             };  
    14.         };  
    15. "another thread");  
    16.         r.start();  
    17. "start to sleeping");  
    18.         Thread.sleep(100000L);  
    19. "start to joining");  
    20.         r.join();  
    21.     }  
    22. }



    这段样例代码很简单,就是两个线程各自调用了sleep方法,但时间不相同,然后在主函数结尾处,还等待新线程结束。对于这段样例,存在两次抓取栈信息的机会,第一次是屏幕输出“start to sleeping”时,可以截取出如下的栈信息。

     

     



     

    1. "another thread" prio=6 tid=0x02b22400 nid=0x2058 waiting on condition [0x0316f000]  
    2.    java.lang.Thread.State: TIMED_WAITING (sleeping)  
    3.     at java.lang.Thread.sleep(Native Method)  
    4.     at DeadLock$1.run(DeadLock.java:9)  
    5.   
    6.    Locked ownable synchronizers:  
    7.     - None  
    8. "main thread" prio=6 tid=0x00868800 nid=0x2720 waiting on condition [0x0098f000]  
    9.    java.lang.Thread.State: TIMED_WAITING (sleeping)  
    10.     at java.lang.Thread.sleep(Native Method)  
    11.     at DeadLock.main(DeadLock.java:19)  
    12.   
    13.    Locked ownable synchronizers:  
    14.     - None


    很明显,两个线程当前都处于sleep状态。第二次抓取栈信息是屏幕输出“start to joining”,这时主线程的等待已执行完成,主线程开始等待新线程执行完成,可以截取到如下的栈信息。

     

    1. "another thread" prio=6 tid=0x02b22400 nid=0x2058 waiting on condition [0x0316f000]  
    2.    java.lang.Thread.State: TIMED_WAITING (sleeping)  
    3.     at java.lang.Thread.sleep(Native Method)  
    4.     at DeadLock$1.run(DeadLock.java:9)  
    5.   
    6.    Locked ownable synchronizers:  
    7.     - None  
    8. "main thread" prio=6 tid=0x00868800 nid=0x2720 in Object.wait() [0x0098f000]  
    9.    java.lang.Thread.State: WAITING (on object monitor)  
    10.     at java.lang.Object.wait(Native Method)  
    11.     - waiting on <0x22bc3d70> (a DeadLock$1)  
    12.     at java.lang.Thread.join(Thread.java:1258)  
    13.     - locked <0x22bc3d70> (a DeadLock$1)  
    14.     at java.lang.Thread.join(Thread.java:1332)  
    15.     at DeadLock.main(DeadLock.java:21)  
    16.   
    17.    Locked ownable synchronizers:  
    18.     - None



    和预期一致,新线程处于sleep状态,而主线程已进入了wait状态。

    例子三

     



     

    1. public class DeadLock {  
    2. public static void main(final String[] args) throws Exception {  
    3. final Object lock1 = new Object();  
    4. final Object lock2 = new Object();  
    5. "main-thread");  
    6. final Thread r = new Thread("another-thread") {  
    7. @Override  
    8. public void run() {  
    9. try {  
    10. synchronized (lock2) {  
    11. 20000);  
    12. synchronized (lock1) {  
    13. 20000);  
    14.                         }  
    15.                     }  
    16.                 }  
    17. catch (final Exception e) {  
    18.                     e.printStackTrace();  
    19.                 }  
    20.             }  
    21.         };  
    22.         r.start();  
    23. 5000);  
    24. synchronized (lock1) {  
    25. 20000);  
    26. synchronized (lock2) {  
    27. 20000);  
    28.             }  
    29.         }  
    30.     }  
    31. }


    这段代码可能有点复杂,但原理不麻烦,就是描述了经典的加锁顺序问题。栈信息比较有趣,裁剪之后如下。

     



    1. "another-thread" prio=6 tid=0x02b22400 nid=0x2250 waiting for monitor entry [0x0316f000]  
    2.    java.lang.Thread.State: BLOCKED (on object monitor)  
    3.     at DeadLock$1.run(DeadLock.java:14)  
    4.     - waiting to lock <0x22bc3f90> (a java.lang.Object)  
    5.     - locked <0x22bc3f98> (a java.lang.Object)  
    6.   
    7.    Locked ownable synchronizers:  
    8.     - None  
    9. "main-thread" prio=6 tid=0x00868800 nid=0x2634 waiting for monitor entry [0x0098f000]  
    10.    java.lang.Thread.State: BLOCKED (on object monitor)  
    11.     at DeadLock.main(DeadLock.java:28)  
    12.     - waiting to lock <0x22bc3f98> (a java.lang.Object)  
    13.     - locked <0x22bc3f90> (a java.lang.Object)  
    14.   
    15.    Locked ownable synchronizers:  
    16.     - None  
    17. Found one Java-level deadlock:  
    18. =============================  
    19. "another-thread":  
    20.   waiting to lock monitor 0x02ad433c (object 0x22bc3f90, a java.lang.Object),  
    21.   which is held by "main-thread"  
    22. "main-thread":  
    23.   waiting to lock monitor 0x02ad363c (object 0x22bc3f98, a java.lang.Object),  
    24.   which is held by "another-thread"  
    25.   
    26. Java stack information for the threads listed above:  
    27. ===================================================  
    28. "another-thread":  
    29.     at DeadLock$1.run(DeadLock.java:14)  
    30.     - waiting to lock <0x22bc3f90> (a java.lang.Object)  
    31.     - locked <0x22bc3f98> (a java.lang.Object)  
    32. "main-thread":  
    33.     at DeadLock.main(DeadLock.java:28)  
    34.     - waiting to lock <0x22bc3f98> (a java.lang.Object)  
    35.     - locked <0x22bc3f90> (a java.lang.Object)  
    36.   
    37. Found 1 deadlock.


    主线程中提示了当前持有的锁和期望获取的锁,而新线程也存在类似的提示,只是获取锁的顺序不同。从栈信息可以看出,两个线程都处于BLOCKED状态。可能是源代码中的问题过于明显,导出的栈信息中给出了死锁的提示信息,实际项目的业务代码比样例要复杂,JVM可能做不到这么智能。这里的提示信息多多少少有点让我失望,在主线程栈信息的最后一部分没有输出当前持有锁的列表,很奇怪,不知道是不是和JDK的实现有关。

    样例四

     



     

    1. import java.security.SecureRandom;  
    2. import java.util.concurrent.SynchronousQueue;  
    3. import java.util.concurrent.ThreadFactory;  
    4. import java.util.concurrent.ThreadPoolExecutor;  
    5. import java.util.concurrent.TimeUnit;  
    6. import java.util.concurrent.atomic.AtomicInteger;  
    7. import java.util.concurrent.locks.ReentrantLock;  
    8.   
    9. public class DeadLock {  
    10. private final ReentrantLock lock = new ReentrantLock();  
    11. private static final SecureRandom random = new SecureRandom();  
    12.   
    13. public DeadLock() {  
    14.     }  
    15.   
    16. public void runWork() {  
    17.   
    18. final ThreadPoolExecutor threadpool = new ThreadPoolExecutor(3, 3, 60L, TimeUnit.DAYS,  
    19. new SynchronousQueue<Runnable>(), new ThreadFactory() {  
    20. private final AtomicInteger counter = new AtomicInteger(1);  
    21. @Override  
    22. public Thread newThread(final Runnable r) {  
    23. return new Thread(r, "thread-sn-" + counter.getAndIncrement());  
    24.             }  
    25.         });  
    26. new ThreadPoolExecutor.CallerRunsPolicy());  
    27. "main-thread");  
    28. for (int i = 0; i < 15; ++i) {  
    29. new Runnable() {  
    30. @Override  
    31. public void run() {  
    32. try {  
    33.   
    34. int timeout = 0;  
    35. while ((timeout = random.nextInt()) <= 0) {  
    36.                         }  
    37. 111;  
    38.                         Thread.sleep(timeout * 100L);  
    39.   
    40.                         lock.lock();  
    41.                         callLongTime();  
    42.   
    43.                     }  
    44. catch (final Exception e) {  
    45.                         e.printStackTrace();  
    46.                     }  
    47. finally {  
    48.                         lock.unlock();  
    49.                     }  
    50.                 }  
    51.             });  
    52.         }  
    53.         threadpool.shutdown();  
    54.     }  
    55.   
    56. public static void main(final String[] args) throws Exception {  
    57. new DeadLock().runWork();  
    58.     }  
    59.   
    60. static long callLongTime() {  
    61. "thread name " + Thread.currentThread().getName());  
    62. long sum = 0;  
    63. for (long i = 0; i < 10000000000L; ++i) {  
    64.             sum = sum ^ i + i;  
    65.         }  
    66. return sum;  
    67.     }  
    68. }



    最后一个样例,写的比较复杂,但事情比较简单,线程之间的同步关键字换成了Java5提供的concurrent库中的重入锁。

     

     



     

    1. "thread-sn-3" prio=6 tid=0x02de8400 nid=0x2688 runnable [0x0320f000]  
    2.    java.lang.Thread.State: RUNNABLE  
    3.     at DeadLock.callLongTime(DeadLock.java:63)  
    4.     at DeadLock$2.run(DeadLock.java:41)  
    5.     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)  
    6.     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)  
    7.     at java.util.concurrent.FutureTask.run(FutureTask.java:166)  
    8.     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)  
    9.     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)  
    10.     at java.lang.Thread.run(Thread.java:722)  
    11.   
    12.    Locked ownable synchronizers:  
    13.     - <0x22be1488> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)  
    14.     - <0x22be66d8> (a java.util.concurrent.ThreadPoolExecutor$Worker)  
    15.   
    16. "thread-sn-2" prio=6 tid=0x02de6c00 nid=0x218c waiting on condition [0x031bf000]  
    17.    java.lang.Thread.State: WAITING (parking)  
    18.     at sun.misc.Unsafe.park(Native Method)  
    19.     - parking to wait for  <0x22be1488> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)  
    20.     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)  
    21.     at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)  
    22.     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)  
    23.     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)  
    24.     at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)  
    25.     at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)  
    26.     at DeadLock$2.run(DeadLock.java:40)  
    27.     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)  
    28.     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)  
    29.     at java.util.concurrent.FutureTask.run(FutureTask.java:166)  
    30.     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)  
    31.     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)  
    32.     at java.lang.Thread.run(Thread.java:722)  
    33.   
    34.    Locked ownable synchronizers:  
    35.     - <0x22be64e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)



    栈信息比较有趣,JVM在每个线程线的最后部分输出了当前线程持有的锁对象,类型和地址。从上述的信息中可以找到,当前处于运行状态的线程持有了一个地址为0x22be1488的锁,而这个锁正是其它线程等待获取的锁对象。

     

    关于Java应用导出的栈信息,由于包含了线程ID,因而还有一个妙用,即可以用来分析各个线程CPU占用情况,这样可以用来分析CPU占用过高的问题,从而指导CPU相关的优化类工作。

    先看导出的栈信息的一部分。

     



     

    1. "main" prio=6 tid=0x00868c00 nid=0x19a4 waiting on condition [0x0098f000]  
    2.    java.lang.Thread.State: TIMED_WAITING (sleeping)  
    3.     at java.lang.Thread.sleep(Native Method)  
    4.     at DeadLock.main(DeadLock.java:3)  
    5.   
    6.    Locked ownable synchronizers:  
    7.     - None


    nid字段即标识出了操作系统的线程ID,使用16进程表示。现在有了操作系统的线程ID,那么有什么方法可以观察到各个线程占用的CPU吗?

     

    对于Linux平台,可以使用top命令和选项H,样例如下。

     

    1. # top -H -p <pid>



    从输出中可以找到各个线程的ID,使用10进制表示。为了找到各个线程对应的栈信息,需要手工把用top命令观察到的线程ID转换为16进制表示,然后在进程的栈信息文件中查找。

     

    对于Windows平台,需要下载一个免费的工具,鼎鼎大名的ProcessExplorer。通过这个工具,可以方便的查看目标Java进程的线程ID列表及CPU占用率,但为了查明占用CPU高的线程具体在做什么,还是需要将线程ID手工转换为16进制表示,然后到栈信息文件中查找。

     

    如果亲自动手分析Java应用导出的栈信息的话,可以惊讶的发现,原来栈信息中包括了JVM完成自己特定任务的线程栈信息。下面看个例子。

     


     


    1. public class Test {  
    2. public static void main(final String[] args) throws Exception {  
    3.         Thread.sleep(200000L);  
    4.     }  
    5. }



    这是一段非常简单的代码,只有一个线程,并且只做了一件事,那就是睡大觉。为了能够说明问题,在Eclipse中运行这段代码时,增加如下的JVM启动参数,配置如下参数的原因下面会提到。

     

     



     



    1. -XX:+UseConcMarkSweepGC



    程序启动后,使用jstack命令导出线程栈的列表,如下。

     

     

    1. 2013-02-12 01:27:24  
    2. Full thread dump Java HotSpot(TM) Client VM (23.6-b04 mixed mode):  
    3.   
    4. "Service Thread" daemon prio=6 tid=0x17975c00 nid=0x202c runnable [0x00000000]  
    5.    java.lang.Thread.State: RUNNABLE  
    6.   
    7.    Locked ownable synchronizers:  
    8.     - None  
    9.   
    10. "C1 CompilerThread0" daemon prio=10 tid=0x17973400 nid=0x25dc waiting on condition [0x00000000]  
    11.    java.lang.Thread.State: RUNNABLE  
    12.   
    13.    Locked ownable synchronizers:  
    14.     - None  
    15.   
    16. "Attach Listener" daemon prio=10 tid=0x1796e400 nid=0x1cf0 waiting on condition [0x00000000]  
    17.    java.lang.Thread.State: RUNNABLE  
    18.   
    19.    Locked ownable synchronizers:  
    20.     - None  
    21.   
    22. "Signal Dispatcher" daemon prio=10 tid=0x1796cc00 nid=0x1d3c runnable [0x00000000]  
    23.    java.lang.Thread.State: RUNNABLE  
    24.   
    25.    Locked ownable synchronizers:  
    26.     - None  
    27.   
    28. "Surrogate Locker Thread (Concurrent GC)" daemon prio=6 tid=0x1796b400 nid=0x21ec waiting on condition [0x00000000]  
    29.    java.lang.Thread.State: RUNNABLE  
    30.   
    31.    Locked ownable synchronizers:  
    32.     - None  
    33.   
    34. "Finalizer" daemon prio=8 tid=0x17921800 nid=0x2618 in Object.wait() [0x177df000]  
    35.    java.lang.Thread.State: WAITING (on object monitor)  
    36.     at java.lang.Object.wait(Native Method)  
    37.     - waiting on <0x02be56a0> (a java.lang.ref.ReferenceQueue$Lock)  
    38.     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)  
    39.     - locked <0x02be56a0> (a java.lang.ref.ReferenceQueue$Lock)  
    40.     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)  
    41.     at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)  
    42.   
    43.    Locked ownable synchronizers:  
    44.     - None  
    45.   
    46. "Reference Handler" daemon prio=10 tid=0x02b3e400 nid=0x2304 in Object.wait() [0x1778f000]  
    47.    java.lang.Thread.State: WAITING (on object monitor)  
    48.     at java.lang.Object.wait(Native Method)  
    49.     - waiting on <0x02be5278> (a java.lang.ref.Reference$Lock)  
    50.     at java.lang.Object.wait(Object.java:503)  
    51.     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)  
    52.     - locked <0x02be5278> (a java.lang.ref.Reference$Lock)  
    53.   
    54.    Locked ownable synchronizers:  
    55.     - None  
    56.   
    57. "main" prio=6 tid=0x00868c00 nid=0x1dc waiting on condition [0x0098f000]  
    58.    java.lang.Thread.State: TIMED_WAITING (sleeping)  
    59.     at java.lang.Thread.sleep(Native Method)  
    60.     at DeadLock.main(DeadLock.java:3)  
    61.   
    62.    Locked ownable synchronizers:  
    63.     - None  
    64.   
    65. "VM Thread" prio=10 tid=0x02b38800 nid=0x195c runnable   
    66.   
    67. "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x0086c800 nid=0x2780 runnable   
    68.   
    69. "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x02a40c00 nid=0x25a0 runnable   
    70.   
    71. "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x02ab6000 nid=0x1828 runnable   
    72. "VM Periodic Task Thread" prio=10 tid=0x17989c00 nid=0x20b0 waiting on condition   
    73.   
    74. JNI global references: 115



    是不是非常壮观,这么简单的代码在JVM内部运行时,居然需要生成这么多的线程。另外在文件的尾部,还可以看到CMS垃圾回收器的线程,是不是很有意思?这也是上面增加相关参数的原因。

     

    这里举这个例子只是想说明通过分析Java应用导出的栈信息文件是一种很有效的窥视JVM内部实现的方法,至于能窥视到多少信息,就要看各人的实力了。本人才疏学浅,暂时还看不出什么特别的东东,目前只能看出JVM自动生成了哪些线程,至于各个线程的作用,除了垃圾回收器相关的,其它线程的功能暂时还不清楚,留待以后找到资料再说。

    曾经在工作中借助top之类的命令来定位Java应用占用CPU过高的问题时,偶然发现过执行垃圾回收任务的线程占用很高CPU的情况,不过问题的根本原因和JVM倒没有什么关系。

    举报

    相关推荐

    0 条评论