Golang中性能剖析 PProf-CFANZ编程社区

Go语言项目中的性能优化主要有以下几个方面：

CPU profile：报告程序的 CPU 使用情况，按照一定频率去采集应用程序在 CPU 和寄存器上面的数据
Memory Profile（Heap Profile）：报告程序的内存使用情况
Block Profiling：报告 goroutines 不在运行状态的情况，可以用来分析和查找死锁等性能瓶颈，记录 goroutine 阻塞等待同步（包括定时器通道）的位置
Goroutine Profiling：报告 goroutines 的使用情况，有哪些 goroutine，它们的调用关系是怎样的
Mutex Profiling：互斥锁分析，报告互斥锁的竞争情况

PProf

runtime/pprof：采集程序（非 Server）的运行数据进行分析

net/http/pprof：采集 HTTP Server 的运行时数据进行分析

pprof 是用于可视化和分析性能分析数据的工具

pprof开启后，每隔一段时间（10ms）就会收集下当前的堆栈信息，获取各个函数占用的CPU以及内存资源；最后通过对这些采样数据进行分析，形成一个性能分析报告。

注意，我们只应该在性能测试的时候才在代码中引入pprof。

支持什么使用模式

Report generation：报告生成
Interactive terminal use：交互式终端使用
Web interface：Web 界面

pprof这个是go语言自带的。启用很简单：

注：生产环境一般都是开个协程： go http.ListenAndServe("0.0.0.0:6060", nil)

demo.go，文件内容：

package main

import (
    "log"
    "net/http"
    _ "net/http/pprof"
)

func main() {
    go http.ListenAndServe("0.0.0.0:6060", nil)
}

运行这个文件，你的 HTTP 服务会多出 /debug/pprof 的 endpoint 可用于观察应用程序的情况

分析

一、通过 Web 界面

查看当前总览：访问 http://127.0.0.1:6060/debug/pprof/

这几个路径表示的是

/debug/pprof/profile：访问这个链接会自动进行 CPU profiling，持续 30s，并生成一个文件供下载

/debug/pprof/block：Goroutine阻塞事件的记录。默认每发生一次阻塞事件时取样一次。

/debug/pprof/goroutines：活跃Goroutine的信息的记录。仅在获取时取样一次。

/debug/pprof/heap：堆内存分配情况的记录。默认每分配512K字节时取样一次。

/debug/pprof/mutex: 查看争用互斥锁的持有者。

/debug/pprof/threadcreate: 系统线程创建情况的记录。仅在获取时取样一次。

二、通过交互式终端使用

（1）go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60

$ go tool pprof http://localhost:6060/debug/pprof/profile\?seconds\=60

Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=60
Saved profile in /Users/eddycjy/pprof/pprof.samples.cpu.007.pb.gz
Type: cpu
Duration: 1mins, Total samples = 26.55s (44.15%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

执行该命令后，需等待 60 秒（可调整 seconds 的值），pprof 会进行 CPU Profiling。结束后将默认进入 pprof 的交互式命令模式，可以对分析的结果进行查看或导出。具体可执行 pprof help 查看命令说明

(pprof) top10
Showing nodes accounting for 25.92s, 97.63% of 26.55s total
Dropped 85 nodes (cum <= 0.13s)
Showing top 10 nodes out of 21
      flat  flat%   sum%        cum   cum%
    23.28s 87.68% 87.68%     23.29s 87.72%  syscall.Syscall
     0.77s  2.90% 90.58%      0.77s  2.90%  runtime.memmove
     0.58s  2.18% 92.77%      0.58s  2.18%  runtime.freedefer
     0.53s  2.00% 94.76%      1.42s  5.35%  runtime.scanobject
     0.36s  1.36% 96.12%      0.39s  1.47%  runtime.heapBitsForObject
     0.35s  1.32% 97.44%      0.45s  1.69%  runtime.greyobject
     0.02s 0.075% 97.51%     24.96s 94.01%  main.main.func1
     0.01s 0.038% 97.55%     23.91s 90.06%  os.(*File).Write
     0.01s 0.038% 97.59%      0.19s  0.72%  runtime.mallocgc
     0.01s 0.038% 97.63%     23.30s 87.76%  syscall.Write

flat：给定函数上运行耗时
flat%：同上的 CPU 运行耗时总比例
sum%：给定函数累积使用 CPU 总比例
cum：当前函数加上它之上的调用运行总耗时
cum%：同上的 CPU 运行耗时总比例

最后一列为函数名称，在大多数的情况下，我们可以通过这五列得出一个应用程序的运行情况，加以优化 🤔

（2）go tool pprof http://localhost:6060/debug/pprof/heap

$ go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /Users/eddycjy/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.008.pb.gz
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 837.48MB, 100% of 837.48MB total
      flat  flat%   sum%        cum   cum%
  837.48MB   100%   100%   837.48MB   100%  main.main.func1

-inuse_space：分析应用程序的常驻内存占用情况
-alloc_objects：分析应用程序的内存临时分配情况

（3） go tool pprof http://localhost:6060/debug/pprof/block

（4） go tool pprof http://localhost:6060/debug/pprof/mutex

三、PProf 可视化界面

启动 PProf 可视化界面

方法一：

$ go tool pprof -http=:8080 cpu.prof

方法二：

$ go tool pprof cpu.prof 
$ (pprof) web

如果出现 Could not execute dot; may need to install graphviz.，就是提示你要安装 graphviz 了

查看 PProf 可视化界面

http://localhost:8080/ui/

Graphviz安装配置及入门

在Windows系统上安装配置Graphviz

1.下载 http://www.graphviz.org/ 找到windows版本
2.安装
3.配置环境变量：计算机→属性→高级系统设置→高级→环境变量→系统变量→path，在path中加入路径安装目录\bin
4.验证：在windows命令行界面，输入dot -version，然后按回车，如果显示如下图所示的graphviz相关版本信息，则安装配置成功。

四、PProf 火焰图

As of Go 1.11, flamegraph visualizations are available in go tool pprof directly!

它的最大优点是动态的。调用顺序由上到下（A -> B -> C -> D），每一块代表一个函数，越大代表占用 CPU 的时间更长。同时它也支持点击块深入进行分析！

Golang中性能剖析 PProf_应用程序

注意事项：获取的 Profiling 数据是动态的，要想获得有效的数据，请保证应用处于较大的负载（比如正在生成中运行的服务，或者通过其他工具模拟访问压力）。否则如果应用处于空闲状态，得到的结果可能没有任何意义。

简单的工具类。用于调用分析

package main

import (
  "os"
  "runtime"
  "runtime/pprof"
)

func StartCpuProf() error {
  f, err := os.Create("cpu.prof")
  defer f.Close()
  if err != nil {
    return err
  }
  //开启CPU性能分析：
  return pprof.StartCPUProfile(f)
}

func StopCpuProf() {
  //停止CPU性能分析：
  pprof.StopCPUProfile()
}

//--------Mem
func ProfGc() {
  runtime.GC() // get up-to-date statistics
}

func SaveMemProf() error {
  f, err := os.Create("mem.prof")
  defer f.Close()
  if err != nil {
    return err
  }
  return pprof.WriteHeapProfile(f)
}

// goroutine block
func SaveBlockProfile() error {
  f, err := os.Create("block.prof")
  defer f.Close()
  if err != nil {
    return err
  }
  return pprof.Lookup("block").WriteTo(f, 0)
}