0
点赞
收藏
分享

微信扫一扫

Docker之Linux Cgroup实践

Java架构领域 2021-09-28 阅读 73
引言

Docker通过cgroup技术实现对容器资源的限制,在我们自己的应用中通过对某些不重要的应用做cpu、内存资源上限的限制来确保任何时候重要的应用有足够的资源使用。下面是对docker容器做cpu与内存资源限制的实践记录,环境为centos7.2+docker1.12.6。

Linux Cgroup介绍

Linux Cgroup最主要的作用是为一个进程组设置资源使用的上限,这些资源包括CPU、内存、磁盘、网络等。在linux中,Cgroup给用户提供的操作接口是文件系统,其以文件和目录的方式组织在/sys/fs/cgroup路径下。更多的cgroup介绍可以阅读https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt、https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt、
https://coolshell.cn/articles/17049.html等文章。

Linux Cgroup简单实践

通过一个简单的例子,对一个进程的cpu使用率做限制,操作如下:

  1. 进入/sys/fs/cgroup/cpu路径下,创建名为hello的目录
cd /sys/fs/cgroup/cpu
mkdir hello
ls hello/
cgroup.clone_children  cgroup.procs  cpuacct.usage         cpu.cfs_period_us  cpu.rt_period_us   cpu.shares  notify_on_release
cgroup.event_control   cpuacct.stat  cpuacct.usage_percpu  cpu.cfs_quota_us   cpu.rt_runtime_us  cpu.stat    tasks
  1. 查看hello目录下cpu.cfs_period_us与cpu.cfs_quota_us两个文件的值
cat /sys/fs/cgroup/cpu/hello/cpu.cfs_period_us 
100000    默认值100ms
cat /sys/fs/cgroup/cpu/hello/cpu.cfs_quota_us 
-1    默认值表示无任何限制

cpu.cfs_period_us:设定重新分配cgroup可用CPU资源的时间间隔,单位为微秒
cpu.cfs_quota_us:设定在某一阶段(由cpu.cfs_period_us规定)某个cgroup中所有任务可运行的时间总量
  1. shell运行一个while死循环,然后通过top查看
while true;do :;done & 
[1] 30328    进程号为30328

PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
30328 root      20   0  115516    628    132 R 99.9  0.1   1:10.92 bash

可见cpu使用率接近100%
  1. 通过hello的设置对此进程的cpu使用率做限制,使得cpu使用率上限为50%
echo 50000 > /sys/fs/cgroup/cpu/hello/cpu.cfs_quota_us
echo 30328 >> /sys/fs/cgroup/cpu/hello/tasks

PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND 
30328 root      20   0  115516    628    132 R 49.8  0.1   7:44.47 bash

通过top命令观察,该进程的cpu使用率在50%上下浮动


Docker容器的cgroup设置

这里同样以cpu设置为例,探索docker容器是如何通过cgroup做限制的

  1. 创建容器之前,查看/sys/fs/cgroup/cpu路径
ls -l /sys/fs/cgroup/cpu/
total 0
-rw-r--r--  1 root root 0 Oct 11 10:30 cgroup.clone_children
--w--w--w-  1 root root 0 Oct 11 10:30 cgroup.event_control
-rw-r--r--  1 root root 0 Oct 11 10:30 cgroup.procs
-r--r--r--  1 root root 0 Oct 11 10:30 cgroup.sane_behavior
-r--r--r--  1 root root 0 Oct 11 10:30 cpuacct.stat
-rw-r--r--  1 root root 0 Oct 11 10:30 cpuacct.usage
-r--r--r--  1 root root 0 Oct 11 10:30 cpuacct.usage_percpu
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.cfs_period_us
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.cfs_quota_us
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.rt_period_us
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.rt_runtime_us
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.shares
-r--r--r--  1 root root 0 Oct 11 10:30 cpu.stat
-rw-r--r--  1 root root 0 Oct 11 10:30 notify_on_release
-rw-r--r--  1 root root 0 Oct 11 10:30 release_agent
drwxr-xr-x 52 root root 0 Jan  3 14:16 system.slice
-rw-r--r--  1 root root 0 Oct 11 10:30 tasks
drwxr-xr-x  2 root root 0 Oct 11 11:11 user.slice

可见并没有docker的相关子目录
  1. 创建容器
docker run -tid --name stress --entrypoint bash polinux/stress:1.0.4
  1. 查看/sys/fs/cgroup/cpu路径下的变化
ll /sys/fs/cgroup/cpu/
total 0
-rw-r--r--  1 root root 0 Oct 11 10:30 cgroup.clone_children
--w--w--w-  1 root root 0 Oct 11 10:30 cgroup.event_control
-rw-r--r--  1 root root 0 Oct 11 10:30 cgroup.procs
-r--r--r--  1 root root 0 Oct 11 10:30 cgroup.sane_behavior
-r--r--r--  1 root root 0 Oct 11 10:30 cpuacct.stat
-rw-r--r--  1 root root 0 Oct 11 10:30 cpuacct.usage
-r--r--r--  1 root root 0 Oct 11 10:30 cpuacct.usage_percpu
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.cfs_period_us
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.cfs_quota_us
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.rt_period_us
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.rt_runtime_us
-rw-r--r--  1 root root 0 Oct 11 10:30 cpu.shares
-r--r--r--  1 root root 0 Oct 11 10:30 cpu.stat
drwxr-xr-x  3 root root 0 Jan  3 16:42 docker    多了一个docker目录
-rw-r--r--  1 root root 0 Oct 11 10:30 notify_on_release
-rw-r--r--  1 root root 0 Oct 11 10:30 release_agent
drwxr-xr-x 52 root root 0 Jan  3 14:16 system.slice
-rw-r--r--  1 root root 0 Oct 11 10:30 tasks
drwxr-xr-x  2 root root 0 Oct 11 11:11 user.slice

ll /sys/fs/cgroup/cpu/docker/
total 0
-rw-r--r-- 1 root root 0 Jan  3 15:23 cgroup.clone_children
--w--w--w- 1 root root 0 Jan  3 15:23 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan  3 15:23 cgroup.procs
-r--r--r-- 1 root root 0 Jan  3 15:23 cpuacct.stat
-rw-r--r-- 1 root root 0 Jan  3 15:23 cpuacct.usage
-r--r--r-- 1 root root 0 Jan  3 15:23 cpuacct.usage_percpu
-rw-r--r-- 1 root root 0 Jan  3 15:23 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Jan  3 15:23 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Jan  3 15:23 cpu.rt_period_us
-rw-r--r-- 1 root root 0 Jan  3 15:23 cpu.rt_runtime_us
-rw-r--r-- 1 root root 0 Jan  3 15:23 cpu.shares
-r--r--r-- 1 root root 0 Jan  3 15:23 cpu.stat
drwxr-xr-x 2 root root 0 Jan  3 16:42 e0ca8bf3d85d8617d950315f9b64223a294d56becd608a1c826f44d98b3dae19    容器id目录
-rw-r--r-- 1 root root 0 Jan  3 15:23 notify_on_release
-rw-r--r-- 1 root root 0 Jan  3 15:23 tasks

ll /sys/fs/cgroup/cpu/docker/e0ca8bf3d85d8617d950315f9b64223a294d56becd608a1c826f44d98b3dae19/
total 0
-rw-r--r-- 1 root root 0 Jan  3 16:42 cgroup.clone_children
--w--w--w- 1 root root 0 Jan  3 16:42 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan  3 16:42 cgroup.procs
-r--r--r-- 1 root root 0 Jan  3 16:42 cpuacct.stat
-rw-r--r-- 1 root root 0 Jan  3 16:42 cpuacct.usage
-r--r--r-- 1 root root 0 Jan  3 16:42 cpuacct.usage_percpu
-rw-r--r-- 1 root root 0 Jan  3 16:42 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Jan  3 16:42 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Jan  3 16:42 cpu.rt_period_us
-rw-r--r-- 1 root root 0 Jan  3 16:42 cpu.rt_runtime_us
-rw-r--r-- 1 root root 0 Jan  3 16:42 cpu.shares
-r--r--r-- 1 root root 0 Jan  3 16:42 cpu.stat
-rw-r--r-- 1 root root 0 Jan  3 16:42 notify_on_release
-rw-r--r-- 1 root root 0 Jan  3 16:42 tasks
  1. 通过docker exec在容器中运行stress服务,并查看tasks文件
docker exec -it stress stress --cpu 1 --vm-bytes 200M 
stress: info: [5] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd

docker top stress
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                43699               43688               0                   16:42               pts/2               00:00:00            bash
root                44192               44179               0                   16:46               pts/4               00:00:00            stress --cpu 1 --vm-bytes 200M
root                44199               44192               99                  16:46               pts/4               00:00:53            stress --cpu 1 --vm-bytes 200M

cat /sys/fs/cgroup/cpu/docker/e0ca8bf3d85d8617d950315f9b64223a294d56becd608a1c826f44d98b3dae19/tasks 
43699
44192
44199

容器内的进程号均被写入tasks文件中

由此可知在创建一个容器时:

  • docker首先会在/sys/fs/cgroup/cpu路径下创建名为docker的目录
  • 紧接着会在docker的目录下创建容器id名称的子目录
  • 对容器的cpu的使用限制是通过操作容器id子目录下的文件设置达成的
  • 容器内的进程均受容器的资源设置限制
  • 其他的资源比如内存、网络等设置与cpu结构相同

Docker容器cpu限制实践

这里仅对最为常用的两个参数--cpu-period、--cpu-quota做设置,使用stress服务做验证

  1. 不对容器cpu做限制
docker run -itd --name stress polinux/stress:1.0.4 stress --cpu 1 --vm-bytes 200M 
1633f77703ac680c6c9ff77ce5072b6c4d239a546151f945c87f57bb7011e17f

cat /sys/fs/cgroup/cpu/docker/1633f77703ac680c6c9ff77ce5072b6c4d239a546151f945c87f57bb7011e17f/cpu.cfs_period_us 
100000
cat /sys/fs/cgroup/cpu/docker/1633f77703ac680c6c9ff77ce5072b6c4d239a546151f945c87f57bb7011e17f/cpu.cfs_quota_us 
-1
可见没有对stress容器做cpu限制

top命令
  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND 
47864 root      20   0     736     36      0 R 99.3  0.0   3:00.90 stress
cpu吃到约100%
  1. 对容器cpu做限制,--cpu-period=100000,--cpu-quota=60000
docker run -itd --name stress --cpu-period 100000 --cpu-quota 60000 polinux/stress:1.0.4 stress --cpu 1 --vm-bytes 200M 
2330d0ea784e1963332bc7fab3d56560a7985f82a520ee17ed65586b2a47b905

cat /sys/fs/cgroup/cpu/docker/2330d0ea784e1963332bc7fab3d56560a7985f82a520ee17ed65586b2a47b905/cpu.cfs_period_us 
100000
cat /sys/fs/cgroup/cpu/docker/2330d0ea784e1963332bc7fab3d56560a7985f82a520ee17ed65586b2a47b905/cpu.cfs_quota_us 
60000

top命令
PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
48758 root      20   0     736     40      0 R 59.8  0.0   1:54.94 stress
cpu使用率约在60%左右

docker top stress
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                48744               48732               0                   17:18               pts/2               00:00:00            stress --cpu 1 --vm-bytes 200M
root                48758               48744               60                  17:18               pts/2               00:02:48            stress --cpu 1 --vm-bytes 200M
cat /sys/fs/cgroup/cpu/docker/2330d0ea784e1963332bc7fab3d56560a7985f82a520ee17ed65586b2a47b905/tasks 
48744
48758


Docker容器内存限制实践

这里针对--memory,--memory-swap,--memory-swappiness三个参数做实践

实践环境:


创建容器,验证--memory、--memory-swap、--memory-swappiness分别对应的文件值
docker run -itd --name stress --memory 1G --memory-swap 3G --memory-swappiness 20 --entrypoint bash progrium/stress:1.0.1 
7fedd3fa789e135497e7e8df40b9144d8220d24807eeab2c32618d1bd16c315c

ll /sys/fs/cgroup/memory/
total 0
-rw-r--r--  1 root root 0 Oct 11 10:30 cgroup.clone_children
--w--w--w-  1 root root 0 Oct 11 10:30 cgroup.event_control
-rw-r--r--  1 root root 0 Oct 11 10:30 cgroup.procs
-r--r--r--  1 root root 0 Oct 11 10:30 cgroup.sane_behavior
drwxr-xr-x  3 root root 0 Jan  3 19:25 docker    多了一个docker目录
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.failcnt
--w-------  1 root root 0 Oct 11 10:30 memory.force_empty
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.kmem.failcnt
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.kmem.limit_in_bytes
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.kmem.max_usage_in_bytes
-r--r--r--  1 root root 0 Oct 11 10:30 memory.kmem.slabinfo
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.kmem.tcp.failcnt
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.kmem.tcp.limit_in_bytes
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.kmem.tcp.max_usage_in_bytes
-r--r--r--  1 root root 0 Oct 11 10:30 memory.kmem.tcp.usage_in_bytes
-r--r--r--  1 root root 0 Oct 11 10:30 memory.kmem.usage_in_bytes
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.limit_in_bytes
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.max_usage_in_bytes
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.memsw.failcnt
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.memsw.limit_in_bytes
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.memsw.max_usage_in_bytes
-r--r--r--  1 root root 0 Oct 11 10:30 memory.memsw.usage_in_bytes
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.move_charge_at_immigrate
-r--r--r--  1 root root 0 Oct 11 10:30 memory.numa_stat
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.oom_control
----------  1 root root 0 Oct 11 10:30 memory.pressure_level
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.soft_limit_in_bytes
-r--r--r--  1 root root 0 Oct 11 10:30 memory.stat
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.swappiness
-r--r--r--  1 root root 0 Oct 11 10:30 memory.usage_in_bytes
-rw-r--r--  1 root root 0 Oct 11 10:30 memory.use_hierarchy
-rw-r--r--  1 root root 0 Oct 11 10:30 notify_on_release
-rw-r--r--  1 root root 0 Oct 11 10:30 release_agent
drwxr-xr-x 52 root root 0 Jan  3 14:16 system.slice
-rw-r--r--  1 root root 0 Oct 11 10:30 tasks
drwxr-xr-x  2 root root 0 Oct 11 11:11 user.slice

ll /sys/fs/cgroup/memory/docker/
total 0
drwxr-xr-x 2 root root 0 Jan  3 19:30 7fedd3fa789e135497e7e8df40b9144d8220d24807eeab2c32618d1bd16c315c    容器id子目录
-rw-r--r-- 1 root root 0 Jan  3 15:23 cgroup.clone_children
--w--w--w- 1 root root 0 Jan  3 15:23 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan  3 15:23 cgroup.procs
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.failcnt
--w------- 1 root root 0 Jan  3 15:23 memory.force_empty
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.kmem.failcnt
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.kmem.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.kmem.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan  3 15:23 memory.kmem.slabinfo
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.kmem.tcp.failcnt
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.kmem.tcp.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.kmem.tcp.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan  3 15:23 memory.kmem.tcp.usage_in_bytes
-r--r--r-- 1 root root 0 Jan  3 15:23 memory.kmem.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.max_usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.memsw.failcnt
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan  3 15:23 memory.memsw.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.move_charge_at_immigrate
-r--r--r-- 1 root root 0 Jan  3 15:23 memory.numa_stat
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.oom_control
---------- 1 root root 0 Jan  3 15:23 memory.pressure_level
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.soft_limit_in_bytes
-r--r--r-- 1 root root 0 Jan  3 15:23 memory.stat
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.swappiness
-r--r--r-- 1 root root 0 Jan  3 15:23 memory.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 15:23 memory.use_hierarchy
-rw-r--r-- 1 root root 0 Jan  3 15:23 notify_on_release
-rw-r--r-- 1 root root 0 Jan  3 15:23 tasks

ll /sys/fs/cgroup/memory/docker/7fedd3fa789e135497e7e8df40b9144d8220d24807eeab2c32618d1bd16c315c/
total 0
-rw-r--r-- 1 root root 0 Jan  3 19:30 cgroup.clone_children
--w--w--w- 1 root root 0 Jan  3 19:30 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan  3 19:30 cgroup.procs
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.failcnt
--w------- 1 root root 0 Jan  3 19:30 memory.force_empty
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.kmem.failcnt
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.kmem.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.kmem.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan  3 19:30 memory.kmem.slabinfo
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.kmem.tcp.failcnt
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.kmem.tcp.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.kmem.tcp.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan  3 19:30 memory.kmem.tcp.usage_in_bytes
-r--r--r-- 1 root root 0 Jan  3 19:30 memory.kmem.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.max_usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.memsw.failcnt
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan  3 19:30 memory.memsw.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.move_charge_at_immigrate
-r--r--r-- 1 root root 0 Jan  3 19:30 memory.numa_stat
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.oom_control
---------- 1 root root 0 Jan  3 19:30 memory.pressure_level
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.soft_limit_in_bytes
-r--r--r-- 1 root root 0 Jan  3 19:30 memory.stat
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.swappiness
-r--r--r-- 1 root root 0 Jan  3 19:30 memory.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  3 19:30 memory.use_hierarchy
-rw-r--r-- 1 root root 0 Jan  3 19:30 notify_on_release
-rw-r--r-- 1 root root 0 Jan  3 19:30 tasks

cat memory.limit_in_bytes 
1073741824    对应--memory=1G
cat memory.memsw.limit_in_bytes 
3221225472    对应--memory-swap=3G
cat memory.swappiness 
20            对应--memory-swappiness 20



实践一(不设置--memory与--memory-swap)

  1. 当容器使用完宿主机内存与交换分区时,经过一段时间系统会因为oom kill容器
docker run -itd --name stress progrium/stress:1.0.1 --cpu 1 --vm 2 --vm-bytes 19.9999G

Jun 26 10:27:49 localhost kernel: Out of memory: Kill process 2518 (stress) score 498 or sacrifice child
Jun 26 10:27:49 localhost kernel: Killed process 2518 (stress) total-vm:19930256kB, anon-rss:8106200kB, file-rss:8kB
Jun 26 10:27:50 localhost dockerd: time="2018-06-26T10:27:49.971614112+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 10:27:50 localhost kernel: XFS (dm-3): Unmounting Filesystem
发生了oom killer事件
  1. 当容器使用完宿主机内存与交换分区时,如果容器已经设置了--oom-kill-disable=true,检测容器是否会因为oom被kill
docker run -itd --name stress --oom-kill-disable=true progrium/stress:1.0.1 --cpu 1 --vm 2 --vm-bytes 19.9999G

WARNING: Disabling the OOM killer on containers without setting a '-m/--memory' limit may be dangerous.
Jun 26 10:24:46 localhost kernel: Out of memory: Kill process 2399 (stress) score 503 or sacrifice child
Jun 26 10:24:46 localhost kernel: Killed process 2399 (stress) total-vm:19930256kB, anon-rss:8236728kB, file-rss:8kB
Jun 26 10:24:47 localhost dockerd: time="2018-06-26T10:24:47.314803845+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 10:24:47 localhost kernel: XFS (dm-3): Unmounting Filesystem
依然发生了oom killer的事件,猜测可能是类似于警告中说的没有设置--memory的原因
  1. 对容器设置memory限制且不使用swap,并设置--oom-kill-disable=true,验证当容器内的应用使用内存超出限制时,该容器是否会因为oom 被kill
docker run -itd --name stress --memory 5G --memory-swap 5G --oom-kill-disable=true progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 6G

docker stats stress 监控信息如下:

经过一段时间,容器并没有发生oom,stats命令中的MEM %参数的值一直是100.00%

  1. 对上述3中的容器去掉 --oom-kill-disable=true 的设置,检测结果
docker run -itd --name stress --memory 5G --memory-swap 5G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 6G

Jun 26 10:48:37 localhost kernel: Memory cgroup out of memory: Kill process 6828 (stress) score 1001 or sacrifice child
Jun 26 10:48:37 localhost kernel: Killed process 6828 (stress) total-vm:6298768kB, anon-rss:5241352kB, file-rss:8kB
Jun 26 10:48:37 localhost kernel: XFS (dm-3): Unmounting Filesystem
Jun 26 10:48:37 localhost dockerd: time="2018-06-26T10:48:37.597444881+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
很快容器就发生了oom killer事件

可知容器只有在设置了memory限制之后,--oom-kill-disable才会起作用

实践二(设置--memory,不设置--memory-swap):
根据docker文档说明,这时容器可以使用到的最大的内存为2倍的memory的内存(memory=memory-swap, 2倍=memory+memory-swap)

docker run -itd --name stress --memory 1G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 2G

Jun 26 11:15:18 localhost kernel: Memory cgroup out of memory: Kill process 7333 (stress) score 1001 or sacrifice child
Jun 26 11:15:18 localhost kernel: Killed process 7333 (stress) total-vm:2104464kB, anon-rss:998344kB, file-rss:8kB
Jun 26 11:15:19 localhost dockerd: time="2018-06-26T11:15:19.085601671+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 11:15:19 localhost kernel: XFS (dm-3): Unmounting Filesystem
可见,依然发生了oom,因此容器能使用的内存需小于2倍memory(这里的2G)
docker run -itd --name stress --memory 1G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 1.99G

此时,经过一段时间的检测,并没有发生oom,docker stats stress:

MEM %的值一直是100.00%以内浮动

实践三(设置--memory与--memory-swap)
由docker文档知,memory-swap等于memory+swap的和,因此要求memory-swap >= memory(Minimum memoryswap limit should be larger than memory limit)

docker run -itd --name stress --memory 1G --memory-swap 2G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 1.5G
由于memory=1G,因此1.5G=memory(1G) + swap(0.5G)

docker stats stress:

MEM %的值也一直是在100.00%以内浮动,另外也可知LIMIT仅是memory的值

容器如果不想要使用宿主机的swap,则可将memory-swap的值设置与memory的值相等即可
docker run -itd --name stress --memory 1G --memory-swap 1G progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 999M
设置memory-swap为-1,表示不限制容器使用的swap分区的大小
docker run -itd --name stress --memory 1G --memory-swap -1 progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 2G
使用了1Gmemory、1G的系统swap

当前实践的机器的swap为5G,因此这里设置用完5Gswap

docker run -itd --name stress --memory 1G --memory-swap -1 progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 6G

Jun 26 14:09:45 localhost kernel: Memory cgroup out of memory: Kill process 11335 (stress) score 991 or sacrifice child
Jun 26 14:09:45 localhost kernel: Killed process 11335 (stress) total-vm:6298768kB, anon-rss:1037064kB, file-rss:4kB
Jun 26 14:09:46 localhost dockerd: time="2018-06-26T14:09:46.073190546+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 14:09:46 localhost kernel: XFS (dm-3): Unmounting Filesystem
运行了一段时间,就会产生oom

设置使用系统的swap超出5G

docker run -itd --name stress --memory 1G --memory-swap -1 progrium/stress:1.0.1 --cpu 1 --vm 1 --vm-bytes 8G

Jun 26 14:19:10 localhost kernel: Memory cgroup out of memory: Kill process 11654 (stress) score 992 or sacrifice child
Jun 26 14:19:10 localhost kernel: Killed process 11654 (stress) total-vm:8395920kB, anon-rss:1043232kB, file-rss:8kB
Jun 26 14:19:11 localhost dockerd: time="2018-06-26T14:19:11.196855155+08:00" level=warning msg="failed to close stdin: rpc error: code = 2 desc = containerd: container not found"
Jun 26 14:19:11 localhost kernel: XFS (dm-3): Unmounting Filesystem
也是运行一段时间,产生oom

通过设置memory-swappiness参数使得容器禁用系统交换空间

docker run -itd --name stress --memory 1G --memory-swappiness 0 --entrypoint bash progrium/stress:1.0.1

root@node75:/# stress --vm 1 --vm-bytes 1G 
stress: info: [28] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [28] (416) <-- worker 29 got signal 9
stress: WARN: [28] (418) now reaping child worker processes
stress: FAIL: [28] (452) failed run completed in 1s
Jun 26 17:53:40 localhost kernel: Memory cgroup out of memory: Kill process 27849 (stress) score 998 or sacrifice child
Jun 26 17:53:40 localhost kernel: Killed process 27849 (stress) total-vm:1055888kB, anon-rss:1045144kB, file-rss:0kB


结论

上述是对容器较为常用的cpu与内存的限制做了一些实践记录,了解了docker是如何通过linux cgroup对容器的资源做的限制。当然除了cpu、内存之外的其他资源的限制我们同样可以通过实践的方式一点一点去探索,求证。

引用

https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
https://coolshell.cn/articles/17049.html
https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/7/html/resource_management_guide/index

举报

相关推荐

0 条评论