文章目录

Python解释器
性能分析方法
参考目录

Python解释器

python 解释器有很多：CPython、IPython、Jython、PyPy，这些解释器由编译器和虚拟机组成。

虚拟机可以让python的编程者无需关注底层实现（比如要如何为数组分配内存、如何组织内存以及用什么样的顺序将内存传入 CPU），好处是可以直接快速设计出更高层的业务逻辑和算法，缺陷则是要付出性能损失的代价。但python指令层本身就存在优化，所以更好的利用这一层优化（即使用正确的指令顺序），也就可以提高你写的python性能。

其次，python的GIL（全局解释器锁）会影响程序在并行方面的性能。GIL是CPython中的一个概念，它通过计数的方式进行内存管理，实现了一个互斥锁（防止多线程并发执行机器码）。那么在多核CPU上运行时python时，实际用到的可能就是一个单核，其他沦为摆设。如下图所示，三个线程都只有等到前一个释放资源后才能继续运行。因此需要使用标准库的 multiprocessing，numexpr，或分布式计算模型等方法来解决。

GIL

另一方面，Python 使用了动态类型，且 Python 也并不是一门编译性的
语言。由于代码在运行过程中会发生改变，那么也没办法在编译器层面对代码进行优化。但使用Just-In-Time（JIT）技术可以改善这一问题，实现加速。例如CPython中将python代码注释为C语言类型，还有微软的Pyston等。

性能分析方法

1、time计时

time计时

import time

start_time = time.time()
for i in range(1,1000000):
    x = i*i
end_time = time.time()

print("Spend:{:.4f}".format(end_time-start_time) )

结果：

Spend:0.0618

装饰器

可以定义一个装饰器来自动测量时间:

from functools import wraps 
def timefn(fn): 
 @wraps(fn) 
 def measure_time(*args, **kwargs): 
 	t1 = time.time()

使用装饰器：

@timefn 
def calculate_z_serial_purepython(maxiter, zs, cs):

timeit模块
该模块会禁用垃圾回收机制。命令行中使用-m timeit的方式就可以调用
指定-n 循环次数和-r 重复次数，如果不指定则默认为n=10，r=5。

python -m timeit -n 5 -r 5 -s "import julia1" "julia1.calc_pure_python(desired_width=1000, 
 max_iterations=300)"

UNIX 的 time
调用python脚本时，命令行前加上 /usr/bin/time -p，使用系统的time。但只能在类UNIX系统下使用。

python脚本:test.py

	import time
	start_time = time.time()
	for i in range(1,1000000):
	    x = i*i
	end_time = time.time()
	print("Finish test.")

运行：/usr/bin/time --verbose python test.py
在UNIX 的 time

打开--verbose 开关可以获得更多输出信息

2、标准库内建分析工具

总共有三个：

hotshot
cProfile 模块
profile 模块

后两者接口是一致的，实现方法不同。profile是纯python实现，而cProfile用C语言钩入 CPython 的虚拟机来测量其每一个函数运行所花费的时间（代价巨大但信息更丰富）。

例子：

import cProfile
import re
cProfile.run('re.compile("foo|bar")')

打印信息：

197 function calls (192 primitive calls) in 0.002 seconds
Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.001    0.001 <string>:1(<module>)
     1    0.000    0.000    0.001    0.001 re.py:212(compile)
     1    0.000    0.000    0.001    0.001 re.py:268(_compile)
     1    0.000    0.000    0.000    0.000 sre_compile.py:172(_compile_charset)
     1    0.000    0.000    0.000    0.000 sre_compile.py:201(_optimize_charset)
     4    0.000    0.000    0.000    0.000 sre_compile.py:25(_identityfunction)
   3/1    0.000    0.000    0.000    0.000 sre_compile.py:33(_compile)

分析工具
为了更好的分析cProfile得到的结果，可以使用这些模块。

runsnakerun可视化工具：
‎‎pstats分析工具：

import pstats
# 创建Stats对象
p = pstats.Stats("result.out")

# strip_dirs(): 去掉无关的路径信息
# sort_stats(): 排序，支持的方式和上述的一致
# print_stats(): 打印分析结果，可以指定打印前几行

# 和直接运行cProfile.run("test()")的结果是一样的
p.strip_dirs().sort_stats(-1).print_stats()

# 按照函数名排序，只打印前3行函数的信息, 参数还可为小数,表示前百分之几的函数信息 
p.strip_dirs().sort_stats("name").print_stats(3)

# 按照运行时间和函数名进行排序
p.strip_dirs().sort_stats("cumulative", "name").print_stats(0.5)

# 如果想知道有哪些函数调用了sum_num
p.print_callers(0.5, "sum_num")

# 查看test()函数中调用了哪些函数
p.print_callees("test")

3、逐行分析

line_profiler可以进行逐行分析
pip或者conda下载line_profiler包以后，用@profile装饰器的方式使用。

用修饰器（@profile）标记选中的函数。用 kernprof.py 脚本运行你的代码，被选函数每一行花费的 CPU 时间以及其他信息就会被记录下来。

命令行中运行 kernprof 逐行分析被修饰函数的 CPU 开销：
kernprof -l -v test.py

运行时参数-l 代表逐行分析而不是逐函数分析，-v 用于显示输出。没有-v，你会
得到一个.lprof 的输出文件，回头你可以用 line_profiler 模块对其进行分
析。例 2-6 中，我们会完整运行一遍我们的 CPU 密集型函数

4、诊断内存

memory_profiler可以诊断内存的用量，操作与上一个包类似，也要先添加@profile在你需要诊断的函数上方，然后运行：
python -m memory_profiler test.py
再通过mprof功能，将生成的统计文件制作成图。

其他建议

1、如果觉得更改代码，每次都要去添加@profile很麻烦，可以考虑使用no-op 修饰器，避免出现Import Error之类的引用错误。
例如：

# memory_profiler 
if 'profile' not in dir(): 
	 def profile(func): 
		 def inner(*args, **kwargs): 
			 return func(*args, **kwargs) 
	return inner

2、要保证测试机器的稳定，例如在 BIOS 上禁用了 TurboBoost，禁用操作系统改写 SpeedStep，不要用笔记本电池而是使用主电源。
3、多次测试，备份数据。

参考目录

[1] Python高性能编程
[2] Realpython
[3] 一份让Python疯狂加速的工具合集！
[4] python性能分析之cProfile模块
[5]The Python Profilers