用 Rprof 进行性能分析-CFANZ编程社区

R 提供了内置函数 Rprof( ) 对代码的性能进行分析。在分析过程中，会有一个抽样

程序，并且是和后续代码一起运行的，直到分析结束。默认情况下，抽样程序基本上每隔

20 毫秒就会记录一下当前 R 在运行哪个函数。这样，如果某个函数运行得很慢，那么很可

能大部分时间都在调用这个函数。

这种抽样方法可能不会得到非常精确的结果，但是大多数情况下，它都可以满足要求。

在下面的例子中，我们将使用 Rprof( ) 分析调用 my_cumsum1( ) 时的执行过程，并尝

试找出使代码变慢的部分。

使用 Rprof( ) 的方法非常简单：调用 Rprof( ) 开始分析，运行你想分析的代码；

再调用 Rprof(NULL) 停止分析；最后调用 summaryRprof( ) 查看分析结果：

x <- rnorm(1000)
tmp <- tempfile(fileext = ".out")
Rprof(tmp)
for (i in 1:1000) {
my_ _cumsum1(x)
}
Rprof(NULL)
summaryRprof(tmp)
## $by.self
## self.time self.pct total.time total.pct
## "c" 2.42 82.88 2.42 82.88
## "my_cumsum1" 0.46 15.75 2.92 100.00
## "+" 0.04 1.37 0.04 1.37
## $by.total
## total.time total.pct self.time self.pct
## "my_cumsum1" 2.92 100.00 0.46 15.75
## "c" 2.42 82.88 2.42 82.88
## "+" 0.04 1.37 0.04 1.37
##
## $sample.interval
## [1] 0.02
##
## $sampling.time
## [1] 2.92

注意到，我们使用 tempfile( ) 创建了一个临时文件来储存分析数据。如果不向

Rprof( )提供这样一个文件，它会在当前工作目录中自动创建一个 Rprof.out 文件。

summaryRprof( )同样也有这样的默认行为。

分析结果将分析数据汇总为可读格式：$by.self 通过 self.time 对时间进行排序，

$by.total 通过 total.time 对时间排序。更具体地说，self.time 只是在函数内部

运行代码所花的时间，而 total.time 是函数的总运行时间。

要想找出代码瓶颈，我们应该更关心 self.time，因为它代表的是每个函数运行的

独立时间。

之前的分析结果显示 c( ) 占了整个运行中绝大部分时间，即 y <-c(y, sum_x) 是

拖慢函数运行的主要原因。

也可以对 my_cumsum2( ) 进行同样的分析。分析结果显示大部分时间用在了 my_

cumsum2( ) 上，不过这是正常的，因为这是代码所做的唯一一件事。换句话说，my_

cumsum2( ) 中没有某个特定函数会占用大部分的运行时间：

tmp <- tempfile(fileext = ".out")
Rprof(tmp)
for (i in 1:1000) {
my_ _cumsum2(x)
}
Rprof(NULL)
summaryRprof(tmp)
## $by.self
## self.time self.pct total.time total.pct
## "my_cumsum2" 1.42 97.26 1.46 100.00
## "-" 0.04 2.74 0.04 2.74
##
## $by.total
## total.time total.pct self.time self.pct
## "my_cumsum2" 1.46 100.00 1.42 97.26
## "-" 0.04 2.74 0.04 2.74
##
## $sample.interval
## [1] 0.02
##
## $sampling.time
## [1] 1.46

在实际情况中，我们想要分析的代码通常很复杂，它有可能涉及很多不同的函数。如

果只看每个函数的时间，这样的分析结果可能就不够用了。幸运的是，Rprof( ) 支持按

行分析，当指定 line.profiling = TRUE，并使用 source(..., keep.source =

TRUE) 时，它会告诉我们每行代码的运行时间。

我们用以下代码在 code/my_cumsum1.R 中创建一个脚本文件：

my_cumsum1 <- function(x) {
y <- numeric( )
sum_x <- 0
for (xi in x) {
sum_x <- sum_x + xi
y <- c(y, sum_x)
}
y
}
x <- rnorm(1000)
for (i in 1:1000) {
my_ _cumsum1(x)
}

然后，调用 Rprof( ) 和 source( ) 分析这个脚本文件：

tmp <- tempfile(fileext = ".out")
Rprof(tmp, line.profiling = TRUE)
source("code/my_cumsum1.R", keep.source = TRUE)
Rprof(NULL)
summaryRprof(tmp, lines = "show")
## $by.self
## self.time self.pct total.time total.pct
## my_cumsum1.R#6 2.38 88.15 2.38 88.15
## my_cumsum1.R#5 0.26 9.63 0.26 9.63
## my_cumsum1.R#4 0.06 2.22 0.06 2.22
##
## $by.total
## total.time total.pct self.time self.pct
## my_cumsum1.R#14 2.70 100.00 0.00 0.00
## my_cumsum1.R#6 2.38 88.15 2.38 88.15
## my_cumsum1.R#5 0.26 9.63 0.26 9.63
## my_cumsum1.R#4 0.06 2.22 0.06 2.22
##
## $by.line
## self.time self.pct total.time total.pct
## my_cumsum1.R#4 0.06 2.22 0.06 2.22
## my_cumsum1.R#5 0.26 9.63 0.26 9.63
## my_cumsum1.R#6 2.38 88.15 2.38 88.15
## my_cumsum1.R#14 0.00 0.00 2.70 100.00
##
## $sample.interval
## [1] 0.02
##
## $sampling.time
## [1] 2.7