Linux命令详解之awk-CFANZ编程社区

Linux命令详解之awk【updating…】

1. 参数详解

1.1 查看文件内容

[root@littlelawson ~]# cat test.txt
1 little
2 google
3 baidu
4 alibaba

1.2 执行awk命令：

1.读取文本的第一列

[root@littlelawson ~]# awk '{print $1}' test.txt
1
2
3
4

2.读取文本的第一列，第二列的内容[两个区域的值连在一起]

[root@littlelawson ~]# awk '{print $1 $2}' test.txt
1little
2google
3baidu
4alibaba

[root@littlelawson ~]# awk '{print $1      $2}' test.txt
1enmonster
2google
3baidu
4alibaba
5enmonster

3、打印完第一列，第二列【二者之间用空格分割】： awk '{print $1,$2}' filename

[root@littlelawson ~]# awk '{print $1, $2}' test.txt
1 enmonster
2 google
3 baidu
4 alibaba
5 enmonster

4.当前读入的整行文本内容

[root@littlelawson ~]# awk '{print $0}' test.txt
1 little
2 google
3 baidu
4 alibaba

5、打印文本文件的总行数【大小写严格】：awk 'END{print NR}' filename

[root@littlelawson ~]# awk 'END{print NR}' test.txt
5

6、打印文本第一行：awk 'NR==1{print}' filename

[root@littlelawson ~]# awk 'NR==1{print}' test.txt
1 enmonster

7、shell里面的赋值方法有两种，格式为

var= `(命令)`
var=$(命令)其中，var就是变量
因此，如果想要把某一文件的总行数赋值给变量nlines，可以表达为：

1) nlines=`(awk 'END{print NR}' filename)`
或者
2) nlines=$(awk 'END{print NR}' filename)

1.3 内置函数详解

substr()函数实现的是输出部分字符串。主要有两种

substr($0,a) 表示从下标a开始之后的所有字符
substr($0,a,b) 表示从下标a开始的后b个字符

[root@server4 hadoop]# echo "123" | awk '{print substr($0,1,1)}'
1
[root@server4 hadoop]# echo "123" | awk '{print substr($0,2,1)}'
2
[root@server4 hadoop]# echo "123" | awk '{print substr($0,2,2)}'
23
[root@server4 hadoop]# echo "123" | awk '{print substr($0,2)}'
23
[root@server4 hadoop]# echo "123" | awk '{print substr($0,1)}'
123

2. 实战

2.1 指定输出的分割符 `OFS`[`output File Splitter`] （默认是空格）

[root@server4 temp]# echo "hadoop/spark/java/scala/shell" | awk -F'/' '{print $1,$2}'
hadoop spark

可以看到这个是使用空格作为分隔符，但是如何修改 awk 默认的分隔符呢？可以使用OFS 指定，如下：

begin 开始之后，引导一个内建函数。awk 脚本，需要注意两个关键词：BEGIN,END
BEGIN{放的是执行前的语句}；
END{放的是处理完成所有的行之后的语句}
{这里面放的是处理每一行时要执行的语句}

2.2 从文件中找出长度大于 20 的行。

[root@server4 temp]# awk 'length>20' log.txt
10 There are orange,apple,mongo
[root@server4 temp]# cat log.txt 
2 this is a test
3 Are you like awk
This's a test
10 There are orange,apple,mongo

2.3 使用awk 脚本文件，对文本内容进行分析

[root@server4 temp]# cat score.txt 
Marry   2143 78 84 77
Jack    2321 66 78 45
Tom     2122 48 77 71
Mike    2537 87 97 95
Bob     2415 40 57 62
[root@server4 temp]# cat cal.awk 
#!/bin/awk -f
#the pre-execution
BEGIN{
 math=0
 english=0
 computer=0 
 printf "NAME NO. MATH  ENGLISH COMPUTER  TOTAL\n"
 printf"-------------------------------------------------------------\n"
}

{
 math+=$3
 english+=$4
 computer+=$5
 printf "%-6s %-6s %4d %8d %8d %8d\n",$1,$2,$3,$4,$5, $3+$4+$5
}

END{
 print "-------------------------------------------------------------\n"
 printf "total:%10d %8d %8d \n",math,english,computer
#why use math/NR
 printf "AVERAGE:%10.2f %8.2f %8.2f\n",math/NR,english/NR,computer/NR
}
[root@server4 temp]# awk -f cal.awk score.txt 
NAME  NO. MATH  ENGLISH COMPUTER  TOTAL
-------------------------------------------------------------
Marry  2143     78       84       77      239
Jack   2321     66       78       45      189
Tom    2122     48       77       71      196
Mike   2537     87       97       95      279
Bob    2415     40       57       62      159
-------------------------------------------------------------

total:       319      393      350 
AVERAGE:     63.80    78.60    70.00

2.4 输出文件中带有指定字段的行

awk 'BEGIN{INGNORECASE=1} /this/' log.txt

忽略大小；匹配this；在log.txt中的行

2.5 计算某个文件夹下所有.txt 文件的大小

[root@server4 temp]# ls -l *.txt | awk '{sum+=$5} END{print sum}'
344

01.ls -l *.txt 是列举出当前文件夹下所有的.txt文件
02.然后执行awk {sum+=$5}表示的是：将第5列的值执行sum操作。
03.END 这个操作是在执行完上述的操作之后所做的操作。这里是打印出sum值
04.比如可以在这个awk之前添加BEGIN，如下所示：

[root@server4 temp]# ls -l *.txt | awk 'BEGIN {print "hello,awk"} {sum+=$2} END {print sum}'
hello,awk
7

那么这里就会先执行 BEGIN 中的内容，然后执行代码块，然后执行END 中的内容。

2.6 awk 的执行原理

awk 'BEGIN{ commands } pattern{ commands } END{ commands }'

[root@server4 temp]# awk 'BEGIN{printf "序号  名字  课程  分数\n"} {print}' marks.txt
序号  名字  课程  分数
1    张三    语文    80
2    李四    数学    90
3    王五    英语    87
[root@server4 temp]# awk 'BEGIN{printf "序号  名字  课程  分数\n"} {print $0}' marks.txt
序号  名字  课程  分数
1    张三    语文    80
2    李四    数学    90
3    王五    英语    87

[root@server4 temp]# awk 'BEGIN{printf "序号  名字  课程  分数\n"} {print $1,$2,$3}' marks.txt
序号  名字  课程  分数
1 张三 语文
2 李四 数学
3 王五 英语
[root@server4 temp]# awk 'BEGIN{printf "序号  名字  课程  分数\n"} {print $1,$2,$3,$4}' marks.txt
序号  名字  课程  分数
1 张三 语文 80
2 李四 数学 90
3 王五 英语 87

2.7 按照文件大小顺序打印当前目录下的所有文件

[root@server4 shells]# cat print_duplicates.sh 
#!/bin/bash
#Filename:remove-duplicate.sh
#Description:find and remove duplicate files and keep one sample of each file

ls -lS --time-style=long-iso | awk 'BEGIN {
  getline;  
  getline;    
  name1=$8;size=$5
}
{ print $0
}
'

需要注意的地方有：在awk 中需要使用print 函数，而不是echo命令。

[root@server4 shells]# ./print_duplicates.sh 
-rwxr-xr-x. 1 root root    912 2018-11-29 22:07 date.sh
-rwx------. 1 root root    339 2018-07-17 13:54 addScalaHome.sh
-rwx--x--x. 1 root root    287 2018-12-13 19:22 print_duplicates.sh
drwxr-xr-x. 2 root root    148 2018-12-10 20:18 temp
-rwxr-xr-x. 1 root root    134 2018-12-08 18:48 test1.sh
-rwx------. 1 root root    115 2018-07-13 17:00 checkZookeeper.sh
-rwx------. 1 root root    105 2018-08-17 17:19 isSorted.sh
-rwx------. 1 root root    101 2018-08-21 09:58 interactive.sh
···
-rw-------. 1 root root      9 2018-08-21 10:01 input.data
-rw-r--r--. 1 root root      0 2018-08-31 22:05 1.txt
-rw-r--r--. 1 root root      0 2018-08-31 22:05 2.txt
-rw-r--r--. 1 root root      0 2018-08-31 22:05 3.txt
-rw-------. 1 root root      0 2018-08-31 22:02 a.txt
-rw-r--r--. 1 root root      0 1999-06-25 23:20 file
-rw-r--r--. 1 root root      0 2018-12-09 21:33 test4.sh

可以看到：每一行都被打印出来了，这就是awk 处理每一行的原理。这里的$0 代表的就是该行值【但是奇怪的是，如果使用任意一个变量符号，打印出来的也是每行值】

3.面试题

1.使用awk命令分割出`ip addr`命令中的ip地址。

[root@littlelawson ~]# ip addr | awk 'NR==8{print $2}' | awk -F/ '{print $1}'
192.168.211.3

注：

NR==8：是取第8行的值
-F/：是将整行以/分割

4. 参考文章

http://www.runoob.com/linux/linux-comm-awk.html