0
点赞
收藏
分享

微信扫一扫

Go 1.8 performance improvements, one month in

12a597c01003 2022-10-11 阅读 132


Sunday September the 18th marks a month since the Go 1.8 cycle opened officially. I’m passionate about the performance of Go programs, and of the compiler itself. This post is a brief look at the state of play, roughly 1/2 way into the development cycle for Go 1.81.

Note: these results are of course preliminary and represent only a point in time, not the performance of the final Go 1.8 release.

Compile times

Nothing much to report here. Using the methodology from my previous Go 1.7 benchmarks, there is a 3.22%–5.11% improvement in full compile time compared to Go 1.7.

Performance improvements

Intel amd64

Better code generation and small improvements to the runtime and standard library show some small improvements for amd642, but really nothing to write home about yet.

name                       old time/op    new time/op  deltaBinaryTree17-4              3.07s ± 2%     3.06s ± 2%    ~      (p=0.661 n=10+9) Fannkuch11-4                3.23s ± 1%     3.22s ± 0%  -0.43%   (p=0.008 n=9+10) FmtFprintfEmpty-4          64.4ns ± 0%    61.8ns ± 4%  -4.17%   (p=0.005 n=9+10) FmtFprintfString-4          162ns ± 0%     162ns ± 0%    ~      (p=0.065 n=10+9) FmtFprintfInt-4             142ns ± 0%     142ns ± 0%    ~      (p=0.137 n=8+10) FmtFprintfIntInt-4          220ns ± 0%     217ns ± 0%  -1.18%   (p=0.000 n=9+10) FmtFprintfPrefixedInt-4     224ns ± 0%     224ns ± 1%    ~       (p=0.206 n=9+9) FmtFprintfFloat-4           313ns ± 0%     312ns ± 0%  -0.26%   (p=0.001 n=10+9) FmtManyArgs-4               906ns ± 0%     894ns ± 0%  -1.32%    (p=0.000 n=7+6) GobDecode-4                8.88ms ± 1%    8.81ms ± 0%  -0.81%  (p=0.003 n=10+10) GobEncode-4                7.93ms ± 1%    7.88ms ± 0%  -0.66%   (p=0.008 n=9+10) Gzip-4                      272ms ± 1%     277ms ± 0%  +1.95%   (p=0.000 n=10+9) Gunzip-4                   47.4ms ± 0%    47.4ms ± 0%    ~      (p=0.720 n=9+10) HTTPClientServer-4          201µs ± 4%     202µs ± 2%    ~     (p=0.631 n=10+10) JSONEncode-4               19.3ms ± 0%    19.3ms ± 0%    ~     (p=0.063 n=10+10) JSONDecode-4               61.0ms ± 0%    61.2ms ± 0%  +0.33%   (p=0.000 n=10+8) Mandelbrot200-4            5.20ms ± 0%    5.20ms ± 0%    ~      (p=0.475 n=10+7) GoParse-4                  3.95ms ± 1%    3.97ms ± 1%  +0.65%    (p=0.003 n=9+9) RegexpMatchEasy0_32-4      88.4ns ± 0%    88.7ns ± 0%  +0.34%   (p=0.001 n=10+9) RegexpMatchEasy0_1K-4      1.14µs ± 0%    1.14µs ± 0%    ~       (p=0.369 n=9+6) RegexpMatchEasy1_32-4      82.6ns ± 0%    82.0ns ± 0%  -0.70%   (p=0.000 n=9+10) RegexpMatchEasy1_1K-4       469ns ± 0%     463ns ± 0%  -1.23%    (p=0.000 n=6+9) RegexpMatchMedium_32-4      138ns ± 1%     136ns ± 0%  -1.38%   (p=0.000 n=10+9) RegexpMatchMedium_1K-4     43.6µs ± 1%    42.0µs ± 0%  -3.74%    (p=0.000 n=9+9) RegexpMatchHard_32-4       2.25µs ± 1%    2.23µs ± 0%  -0.57%    (p=0.000 n=8+8) RegexpMatchHard_1K-4       68.8µs ± 0%    68.6µs ± 0%  -0.37%    (p=0.000 n=8+8) Revcomp-4                   477ms ± 1%     472ms ± 0%  -1.03%    (p=0.000 n=8+8) Template-4                 76.1ms ± 0%    76.4ms ± 0%  +0.35%    (p=0.000 n=9+9) TimeParse-4                 367ns ± 0%     366ns ± 0%  -0.16%   (p=0.003 n=10+8) TimeFormat-4                386ns ± 0%     384ns ± 0%  -0.58%    (p=0.000 n=9+9)name                     old speed      new speed      deltaGobDecode-4              86.4MB/s ± 1%  87.1MB/s ± 0%  +0.81%  (p=0.003 n=10+10) GobEncode-4              96.7MB/s ± 1%  97.4MB/s ± 0%  +0.66%   (p=0.007 n=9+10) Gzip-4                   71.4MB/s ± 1%  70.0MB/s ± 0%  -1.91%   (p=0.000 n=10+9) Gunzip-4                  409MB/s ± 0%   410MB/s ± 0%    ~      (p=0.703 n=9+10) JSONEncode-4              101MB/s ± 0%   100MB/s ± 0%    ~     (p=0.084 n=10+10) JSONDecode-4             31.8MB/s ± 0%  31.7MB/s ± 0%  -0.33%   (p=0.000 n=10+8) GoParse-4                14.7MB/s ± 1%  14.6MB/s ± 1%  -0.67%    (p=0.002 n=9+9) RegexpMatchEasy0_32-4     362MB/s ± 0%   361MB/s ± 0%  -0.36%   (p=0.000 n=10+9) RegexpMatchEasy0_1K-4     898MB/s ± 0%   898MB/s ± 0%    ~       (p=0.762 n=9+8) RegexpMatchEasy1_32-4     387MB/s ± 0%   390MB/s ± 0%  +0.70%   (p=0.000 n=9+10) RegexpMatchEasy1_1K-4    2.18GB/s ± 0%  2.21GB/s ± 0%  +1.20%    (p=0.000 n=9+9) RegexpMatchMedium_32-4   7.23MB/s ± 1%  7.32MB/s ± 0%  +1.19%   (p=0.000 n=10+9) RegexpMatchMedium_1K-4   23.5MB/s ± 1%  24.4MB/s ± 0%  +3.88%    (p=0.000 n=9+9) RegexpMatchHard_32-4     14.2MB/s ± 1%  14.3MB/s ± 0%  +0.58%    (p=0.000 n=8+8) RegexpMatchHard_1K-4     14.9MB/s ± 0%  14.9MB/s ± 0%  +0.34%    (p=0.000 n=8+7) Revcomp-4                 533MB/s ± 1%   539MB/s ± 0%  +1.04%    (p=0.000 n=8+8) Template-4               25.5MB/s ± 0%  25.4MB/s ± 0%  -0.36%    (p=0.000 n=9+9)

ARM

The major improvement that landed recently in the development branch is the conversion of the remaining architecture backends to use the compiler’s SSA form. This has brought a substantial improvement in generated code for non Intel architectures, like ARM3.

name                       old time/op    new time/op    deltaBinaryTree17-4              33.8s ± 1%      27.7s ± 0%  -18.06%  (p=0.000 n=10+10) Fannkuch11-4                42.0s ± 0%      19.3s ± 0%  -54.10%  (p=0.000 n=10+10) FmtFprintfEmpty-4           670ns ± 1%      581ns ± 1%  -13.30%  (p=0.000 n=10+10) FmtFprintfString-4         2.04µs ± 1%     1.65µs ± 0%  -19.09%  (p=0.000 n=10+10) FmtFprintfInt-4            1.71µs ± 0%     1.21µs ± 0%  -29.39%   (p=0.000 n=10+9) FmtFprintfIntInt-4         2.69µs ± 1%     1.94µs ± 0%  -27.77%  (p=0.000 n=10+10) FmtFprintfPrefixedInt-4    2.70µs ± 0%     1.85µs ± 0%  -31.41%   (p=0.000 n=10+9) FmtFprintfFloat-4          5.15µs ± 0%     3.65µs ± 0%  -29.01%   (p=0.000 n=9+10) FmtManyArgs-4              11.3µs ± 0%      8.5µs ± 0%  -24.79%   (p=0.000 n=10+9) GobDecode-4                 112ms ± 0%       77ms ± 1%  -31.04%    (p=0.000 n=9+9) GobEncode-4                88.5ms ± 1%     77.2ms ± 1%  -12.78%  (p=0.000 n=10+10) Gzip-4                      4.79s ± 0%      3.34s ± 0%  -30.18%    (p=0.000 n=9+9) Gunzip-4                    702ms ± 0%      463ms ± 0%  -34.05%  (p=0.000 n=10+10) HTTPClientServer-4          645µs ± 3%      571µs ± 3%  -11.45%  (p=0.000 n=10+10) JSONEncode-4                227ms ± 0%      186ms ± 0%  -18.16%  (p=0.000 n=10+10) JSONDecode-4                845ms ± 0%      618ms ± 0%  -26.81%  (p=0.000 n=10+10) Mandelbrot200-4            59.3ms ± 0%     40.0ms ± 0%  -32.47%  (p=0.000 n=10+10) GoParse-4                  45.0ms ± 0%     37.0ms ± 0%  -17.68%    (p=0.000 n=9+9) RegexpMatchEasy0_32-4       974ns ± 0%      878ns ± 0%   -9.81%   (p=0.000 n=10+9) RegexpMatchEasy0_1K-4      4.60µs ± 0%     4.48µs ± 0%   -2.57%  (p=0.000 n=10+10) RegexpMatchEasy1_32-4      1.02µs ± 0%     0.94µs ± 0%   -8.08%   (p=0.000 n=8+10) RegexpMatchEasy1_1K-4      6.92µs ± 0%     6.08µs ± 0%  -12.10%  (p=0.000 n=10+10) RegexpMatchMedium_32-4     1.61µs ± 0%     1.27µs ± 0%  -20.98%    (p=0.000 n=9+6) RegexpMatchMedium_1K-4      447µs ± 0%      317µs ± 0%  -29.05%   (p=0.000 n=10+9) RegexpMatchHard_32-4       24.9µs ± 0%     18.4µs ± 0%  -25.89%  (p=0.000 n=10+10) RegexpMatchHard_1K-4        740µs ± 0%      552µs ± 0%  -25.36%  (p=0.000 n=10+10) Revcomp-4                  81.0ms ± 1%     65.2ms ± 0%  -19.53%    (p=0.000 n=9+9) Template-4                  1.17s ± 0%      0.81s ± 0%  -31.28%    (p=0.000 n=9+9) TimeParse-4                5.52µs ± 0%     3.79µs ± 0%  -31.42%   (p=0.000 n=10+9) TimeFormat-4               10.6µs ± 0%      8.5µs ± 0%  -19.14%  (p=0.000 n=10+10)name                     old speed      new speed        deltaGobDecode-4              6.86MB/s ± 0%   9.95MB/s ± 1%  +45.00%    (p=0.000 n=9+9) GobEncode-4              8.67MB/s ± 1%   9.94MB/s ± 1%  +14.69%  (p=0.000 n=10+10) Gzip-4                   4.05MB/s ± 0%   5.81MB/s ± 0%  +43.32%   (p=0.000 n=10+9) Gunzip-4                 27.6MB/s ± 0%   41.9MB/s ± 0%  +51.63%  (p=0.000 n=10+10) JSONEncode-4             8.53MB/s ± 0%  10.43MB/s ± 0%  +22.20%  (p=0.000 n=10+10) JSONDecode-4             2.30MB/s ± 0%   3.14MB/s ± 0%  +36.39%   (p=0.000 n=9+10) GoParse-4                1.29MB/s ± 0%   1.56MB/s ± 0%  +20.93%   (p=0.000 n=9+10) RegexpMatchEasy0_32-4    32.8MB/s ± 0%   36.4MB/s ± 0%  +10.87%  (p=0.000 n=10+10) RegexpMatchEasy0_1K-4     222MB/s ± 0%    228MB/s ± 0%   +2.64%  (p=0.000 n=10+10) RegexpMatchEasy1_32-4    31.3MB/s ± 0%   34.0MB/s ± 0%   +8.75%   (p=0.000 n=9+10) RegexpMatchEasy1_1K-4     148MB/s ± 0%    168MB/s ± 0%  +13.76%  (p=0.000 n=10+10) RegexpMatchMedium_32-4    620kB/s ± 0%    790kB/s ± 0%  +27.42%   (p=0.000 n=10+8) RegexpMatchMedium_1K-4   2.29MB/s ± 0%   3.23MB/s ± 0%  +41.05%  (p=0.000 n=10+10) RegexpMatchHard_32-4     1.29MB/s ± 0%   1.74MB/s ± 0%  +34.88%   (p=0.000 n=9+10) RegexpMatchHard_1K-4     1.38MB/s ± 0%   1.85MB/s ± 0%  +34.06%  (p=0.000 n=10+10) Revcomp-4                31.4MB/s ± 1%   39.0MB/s ± 0%  +24.26%    (p=0.000 n=9+9) Template-4               1.65MB/s ± 0%   2.41MB/s ± 0%  +45.71%   (p=0.000 n=10+9)

Notes:

  1. Despite the Go 1.8 development cycle opening 18 days late, in order to keep to the 6 month cadence, the feature freeze for this cycle will still occur on the 1st of November.
  2. Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 3.13.0-95-generic #142-Ubuntu
  3. Freescale i.MX6, 3.14.77-1-ARCH



举报

相关推荐

0 条评论