线上部署了一个java web服务到tomcat中,前面有nginx进行轮训。发现一个问题,总是有个一个tomcat莫名其妙的假死,通过ip访问服务一直无响应。
1、查看磁盘、内存等信息:
查看服务器基本信息,没有发现异常。
2、查看链接数:
$ netstat -natp | awk '{print $6}' | sort | uniq -c | sort -n
      1 established)
      1 Foreign
     11 LISTEN
     19 TIME_WAIT
    106 CLOSE_WAIT
    142 ESTABLISHED发现close_wait 比较多,但是和高负载的服务器比起来也不算多。看了下其他正常的服务close_wait基本为0,所以就在这里下了很多功夫去查,调linux系统的各种参数...
3、查看gc:
$ jstat -gcutil 3494 1000 1000
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
  0.00   0.00   2.87   1.28 100.00  31420  140.865 31399 4345.929 4486.793
  0.00   0.00   2.94   1.28 100.00  31420  140.865 31399 4345.929 4486.793
  0.00   0.00   2.94   1.28 100.00  31420  140.865 31399 4345.929 4486.793问题找到了问题,perm space占用100%。再去看日志
ERROR] [org.springframework.boot.web.support.ErrorPageFilter:176] Forwarding to error page from request [/algoAbTest/algoAbTestIndex] due to exception [Cannot deserialize; nested exception is org.springframework.core.serializer.support.SerializationFailedException: Failed to deserialize payload. Is the byte array a result of corresponding serialization for DefaultDeserializer?; nested exception is java.lang.OutOfMemoryError: PermGen space]既然知道了问题,接下来就是看问题的原因。
1)查看jvm情况:
$ jmap -heap 3494
Attaching to process ID 3494, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.79-b02
using thread-local object allocation.
Parallel GC with 8 thread(s)
Heap Configuration:
   MinHeapFreeRatio = 0
   MaxHeapFreeRatio = 100
   MaxHeapSize      = 8417968128 (8028.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
   capacity = 2804940800 (2675.0MB)
   used     = 0 (0.0MB)
   free     = 2804940800 (2675.0MB)
   0.0% used
From Space:
   capacity = 524288 (0.5MB)
   used     = 0 (0.0MB)
   free     = 524288 (0.5MB)
   0.0% used
To Space:
   capacity = 524288 (0.5MB)
   used     = 0 (0.0MB)
   free     = 524288 (0.5MB)
   0.0% used
PS Old Generation
   capacity = 5611978752 (5352.0MB)
   used     = 71781144 (68.4558334350586MB)
   free     = 5540197608 (5283.544166564941MB)
   1.2790701314472832% used
PS Perm Generation
   capacity = 85983232 (82.0MB)
   used     = 85983232 (82.0MB)
   free     = 0 (0.0MB)
   100.0% used
32283 interned Strings occupying 3838824 bytes.可以看到,perm space使用了80多M,而默认分配了50M。所以调整tomcat的参数即可
2)解决:
在catalina.sh中加入:
CATALINA_OPTS="$CATALINA_OPTS -Xms20480m -Xmx20480m -Xss2m -XX:PermSize=512M -XX:MaxNewSize=512m -XX:MaxPermSize=512m -XX:NewRatio=2"
CATALINA_OPTS="$CATALINA_OPTS -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -verbose:gc -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Xloggc:/data/logs/tomcat/gc.log"
JMX_REMOTE="-Dcom.sun.management.jmxremote.port=1999 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
CATALINA_OPTS="$CATALINA_OPTS $JMX_REMOTE"3)进一步分析:
之前看到的close_wait都是由于服务无法响应正常请求造成的连接无法关闭。
                










