关键参数
syn 重传多少次后放弃
net.ipv4.tcp_syn_retries
syn ack 重传多少次后放弃
net.ipv4.tcp_synack_retries
syn 包队列
net.ipv4.tcp_max_syn_backlog
tcp TIME-WAIT状态数量上限
net.ipv4.tcp_max_tw_buckets = 5000
内核TCP网络状态记录
cat /proc/net/netstat
TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSPassive PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPPrequeued TCPDirectCopyFromBacklog TCPDirectCopyFromPrequeue TCPPrequeueDropped TCPHPHits TCPHPHitsToUser TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPForwardRetrans TCPSlowStartRetrans TCPTimeouts TCPLossProbes TCPLossProbeRecovery TCPRenoRecoveryFail TCPSackRecoveryFail TCPSchedulerFailed TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop PFMemallocDrop TCPMinTTLDrop TCPDeferAcceptDrop IPReversePathFilter TCPTimeWaitOverflow TCPReqQFullDoCookies TCPReqQFullDrop TCPRetransFail TCPRcvCoalesce TCPOFOQueue TCPOFODrop TCPOFOMerge TCPChallengeACK TCPSYNChallenge TCPFastOpenActive TCPFastOpenActiveFail TCPFastOpenPassive TCPFastOpenPassiveFail TCPFastOpenListenOverflow TCPFastOpenCookieReqd TCPSpuriousRtxHostQueues BusyPollRxPackets TCPAutoCorking TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv TCPSynRetrans TCPOrigDataSent TCPHystartTrainDetect TCPHystartTrainCwnd TCPHystartDelayDetect TCPHystartDelayCwnd TCPACKSkippedSynRecv TCPACKSkippedPAWS TCPACKSkippedSeq TCPACKSkippedFinWait2 TCPACKSkippedTimeWait TCPACKSkippedChallenge TCPWqueueTooBig
TcpExt: 0 0 124847 3480 62783 0 0 0 0 0 20120807 4 84637995 4 0 2850 20821722 3641 1313595 0 4 14751859 335315 4100469 0 408176413 141949 426019176 427062428 4 48698 71 539 1541 0 927 9288 1027 191240 84778 131 5 117383 1059282 53889 30479 120012 441021 1440330 622271 0 5167 0 77826 1316199 1232 25855966 9888 11379 9207 0 7413 0 0 0 187 17232767 972718 25555 0 0 66774 218530 857085 7 0 0 0 0 6332 0 0 0 83440243 730646 0 1232 1486 910 0 0 0 0 0 0 489638 0 2506 162414 162414 462511 38372 917625404 18604 371275 2129 71519 1034 326 58946 0 5 313 0
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps
IpExt: 2 0 0 0 4 0 471156924648 627504136835 0 0 1312 0 0 1942773861 188000 73917 12048 0
每秒tcp重传报文数量
使用以下命令实时观察系统中每秒tcp重传报文数量
watch -n 1 'nstat -z -t 1 | grep -e TcpExtTCPSynRetrans -e TcpRetransSegs -e TcpOutSegs -e TcpInSegs'
其中TcpExtTCPSynRetrans代表syn报文和synack报文的重传数量
TcpRetransSegs代表总的重传数量
TcpOutSegs代表总的tcp报文发出数量
TcpInSeg代表总的入报文数量通常用于计算tcp吞吐量 备注:TcpExtTCPSynRetrans是centos 7系统(与linux内核有关)中新加入的,在2.6.32等系统内核中没有这个参数。
- ss命令
统计重传包
ss -anti | grep -B 1 retrans | head
ESTAB 0 0 X.X.X.X:8080 x.x.x.x:7936
cubic wscale:9,9 rto:912 backoff:2 rtt:27.867/7.356 ato:40 mss:1420 rcvmss:696 advmss:1460 cwnd:2 ssthresh:5 bytes_acked:880624 bytes_received:219972 segs_out:2309 segs_in:2478 send 815.3Kbps lastsnd:3660 lastrcv:5808 lastack:2897 pacing_rate 1.6Mbps retrans:0/55 reordering:17 rcv_rtt:9756.09 rcv_space:106208
--
ESTAB 0 0 X.X.X.X:8080 x.x.x.x:38529
cubic wscale:8,9 rto:273 rtt:72.884/6.379 ato:40 mss:1400 rcvmss:536 advmss:1460 cwnd:4 ssthresh:3 bytes_acked:30304 bytes_received:10536 segs_out:107 segs_in:146 send 614.7Kbps lastsnd:736 lastrcv:746 lastack:664 pacing_rate 1.2Mbps retrans:0/9 reordering:18 rcv_space:29200
--
ESTAB 0 0 X.X.X.X:8080 x.x.x.x:37889
cubic wscale:8,9 rto:233 rtt:32.674/3.392 ato:40 mss:1440 rcvmss:536 advmss:1460 cwnd:10 ssthresh:7 bytes_acked:1409336 bytes_received:443476 segs_out:4038 segs_in:5615 send 3.5Mbps lastsnd:1218 lastrcv:1228 lastack:1189 pacing_rate 7.1Mbps retrans:0/27 reordering:17 rcv_rtt:71482 rcv_space:29200
--
ESTAB 0 0 X.X.X.X:8080 x.x.x.x:39959
统计TCP 各状态的数量
ss -tan|awk 'NR>1{++S[$1]}END{for (a in S) print a,S[a]}'
SYN-RECV 3
LISTEN 57
ESTAB 8924
FIN-WAIT-1 19
FIN-WAIT-2 195
SYN-SENT 1
TIME-WAIT 367
阿里巴巴开源监控工具tsar
下载地址 : https://github.com/alibaba/tsar
tsar --tcp -C | sed 's/:/_/g;s/=/ /g' | xargs -n 2
总结
在业务突发访问会造成瞬间流量峰值,出现大量的TCP 新建请求,导致丢包重传,业务流量不能均衡分发到后端。或者业务没有jij接入CDN 直接回源站访问。
主要原因是本机可以建立的TCP 回话数量超出机器限制。调整TCP 队列或者横向扩容服务器数量可以解决
参考文献:
https://perthcharles.github.io/2015/11/10/wiki-netstat-proc/
https://lwn.net/Articles/508865/
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt