正则表达式 grep egrep sed awk

正则就是有一定规律的字符串，有几个特殊字符很关键（.*+？|），我们平时不仅可以用命令行工具grep/sed/awk去引用正则，而且还可以把正则嵌入在nginx、apache甚至php、python编程语言当中。

grep

语法： grep [-cinvABC] 'word' filename

-c ：打印符合要求的行数

-n ：在输出符合要求的行的同时连同行号一起输出

-v ：打印不符合要求的行

-A：后跟一个数字（有无空格都可以），例如 -A2 则表示打印符合要求的行以及下面两行

-B：后跟一个数字，例如 -B2 则表示打印符合要求的行以及上面两行

-C：后跟一个数字，例如 -C2 则表示打印符合要求的行以及上下各两行

-r ：会把目录下面所有的文件全部遍历

--color ：把匹配到的关键词用红色标识

例子：

过滤出带有某个关键词的行并输出行号

[root@localhost~]# grep -n 'root' 1.txt

过滤出不带有某个关键词的行并输出行号

[root@localhost~]# grep -n -v 'root' 1.txt

过滤出所有包含数字的行

[root@localhost~]# grep '[0-9]' 1.txt

过滤出所有不包含数字的行

[root@localhost~]# grep -v '[0-9]' 1.txt

去除所有以'#'开头的行

[root@localhost~]# grep -v '^#' 1.txt

去除所有空行和以#开头的行

[root@localhost~]# grep -v '^$' 1.txt|grep -v '^#'

过滤出以英文字母开头的行

[root@localhost~]# grep '^[a-zA-Z]' 1.txt

过滤出以非数字开头的行

[root@localhost~]# grep '^[^0-9]' 1.txt

说明：在 '[^ ]'里面加^ 表示取非。

过滤任意一个或多个字符

[root@localhost~]# grep 'r.o' 1.txt ; grep 'r*t' 1.txt; grep 'r.*t' 1.txt

过滤出包含root的行以及下面一行

[root@localhost~]# grep -A1 'root' 1.txt

过滤出包含root的行以及上面一行

[root@localhost~]# grep -B1 'root' 1.txt

. 表示任意一个字符；

*表示零个或多个前面的字符；

.*表示零个或多个任意字符，空行也包含在内

指定过滤字符次数

[root@localhost~]# grep 'o\{2\}' 1.txt

把一个目录下，过滤所有*.php 文档中含有 eval 的行

[root@localhost ~]# grep -r --include="*.php" 'eval' /data

egrep

egrep 工具是grep工具的扩展，它可以实现所有的grep的功能，我们也可以用grep-E 代替egrep，下面是一些额外的特殊用法。

[root@localhost~]# alias egrep='egrep --color'

匹配1个或1个以上+前面的字符

[root@localhost~]# egrep 'o+' 1.txt

匹配0个或者1个？前面的字符

[root@localhost~]# egrep 'o?' 1.txt

匹配roo或者匹配body

[root@localhost~]# egrep 'roo|body' 1.txt

用括号表示一个整体，下面例子会匹配roo或者ato

[root@localhost~]# egrep 'r(oo)|(at)o' 1.txt

匹配1个或者多个'oo'

[root@localhost~]# egrep '(oo)+' 1.txt

. 表示任意一个字符（包括特殊字符）

表示零个或多个前面的字符

.*表示任意个任意字符（包含空行）

+ 表示1个或者多个+ 前面的字符

？表示0个或者1个？前面的字符

其中，+和？ grep不支持，egrep才支持。grep需要加脱意符号才能使用。

sed

sed可以实现grep的大部分功能，而且还可以查找替换，下面简单列一下sed的用法。

打印指定行

[root@localhost~]# sed '10'p -n 1.txt   //打印第10行

[root@localhost~]# sed '1,4'p -n 1.txt  //打印1到4行

[root@localhost~]# sed '5,$'p -n 1.txt  //打印5到末行

说明：这里的p是print的意思，加上-n后就可以打印符合规则的行，如果不加则会把1.txt从头到尾打印一遍。

打印包含某个字符串的行

[root@localhost~]# sed -n '/root/'p 1.txt

可以使用^.*$等特殊符号

[root@localhost~]# sed -n '/ro.t/'p 1.txt

[root@localhost~]# sed -n '/^roo/'p 1.txt

sed跟grep一样，不识别+|{}()等符号，需要借助脱义符号\或者使用选项-r

[root@localhost~]# sed -n -r '/ro+/'p 1.txt

[root@localhost~]# sed -n '/ro\+/'p 1.txt

上面两个命令效果是一样的

-e 可以实现同时进行多个任务

[root@localhost~]# sed -e '/root/p' -e '/body/p' -n 1.txt

[root@localhost~]# sed '/root/p; /body/p' -n 1.txt

删除指定行

[root@localhost~]# sed '/root/d' 1.txt; sed  '1d' 1.txt; sed '1,10d' 1.txt

说明： '/root/d' 删除包含root的行； '1d'或者'1'd删除第一行； '1,10'd删除1到10行

替换功能

[root@localhost~]# sed '1,2s/ot/to/g' 1.txt

说明：s 就是替换的意思，g为全局替换，否则只替换第一次的，这里的/也可以换位#@等

[root@localhost~]# sed '1,2s@ot@to@g' 1.txt

删除所有数字

[root@localhost~]# sed 's/[0-9]//g' 1.txt

说明：其实就是把所有数字替换为空字符

删除所有非数字

[root@localhost~]# sed 's/[^0-9]//g' 1.txt

行头、行尾添加字符串

[root@localhost~]# cat test.file

abc

ab cd

[root@localhost~]# sed 's/^/HEAD&/g' test.file

HEADabc

HEADab cd

[root@localhost~]# sed 's/$/&TAIL/g' test.file

abcTAIL

ab cdTAIL

向指定行号上下插入内容

sed -i '2iabc' test.txt

sed -i '2aabc' test.txt

调换两个字符串位置

[root@localhost~]# head -n2 1.txt |sed -r 's/(root)(.*)(bash)/\3\2\1/'

说明：在sed中可以用( )去表示一个整体，本例中把root和bash调换位置，后面的\1\2\3分别表示第一个小括号里面的，第二个小括号里面的以及第三个小括号里面的内容。

-i 选项可以直接修改文件内容

[root@localhost~]# sed -i 's/ot/to/g' 1.txt

awk

一般应用

截取文档中的某段

[root@localhost~]# awk -F ':' '{print $1}' 1.txt

说明： -F 指定分隔符号为：

也可以使用自定义字符连接每个段

[root@localhost~]# awk -F ':' '{print $1"#"$2"#"$3"#"$4}' 1.txt

或者使用awk内部变量OFS,格式如下：

[root@localhost~]# awk -F ':' '{OFS="#"} {print $1,$2,$3,$4}' 1.txt

匹配字符或字符串

[root@localhost~]# awk '/oo/' 1.txt

针对某个段匹配

[root@localhost~]# awk -F ':' '$1~/oo/' 1.txt

多次匹配

[root@localhost~]# awk -F ':' '/root/ {print $1,$3}; $1~/test/; $3~/20/' 1.txt

条件操作符== ， >，<，!=，>=，<=

第三段为0

[root@localhost~]# awk -F ':' '$3=="0"' 1.txt

第三段大于等于 500

[root@localhost~]# awk -F ':' '$3>=500' 1.txt

说明：当比较数字时，不能加双引号，如果写成 $3>="500" 就不符合需求了。上例中==条件特殊，加不加""都可。

第七段不是 '/sbin/nologin'

[root@localhost~]# awk -F ':' '$7!="/sbin/nologin"' 1.txt

第三段小于第四段

[root@localhost~]# awk -F ':' '$3<$4' 1.txt

第三段大于5，并且第三段小于7

[root@localhost~]# awk -F ':' '$3>5 && $3<7' 1.txt

第三段大于5，或者第七段为'/bin/bash'

[root@localhost~]# awk -F ':' '$3>5 || $7=="/bin/bash"' 1.txt

awk 内置变量 NF（段数） NR（行数）

[root@localhost~]# head -n3 1.txt | awk -F ':' '{print NF}'

[root@localhost~]# head -n3 1.txt | awk -F ':' '{print $NF}'

[root@localhost~]# head -n3 1.txt | awk -F ':' '{print NR}'

打印20行以后的行

[root@localhost~]# awk 'NR>20' 1.txt

打印20行以后并且第一段包含'ssh'的行

[root@localhost~]# awk -F ':' 'NR>20 && $1 ~ /ssh/' 1.txt

更改某个段的值

[root@localhost~]# awk -F ':' '$1="root"' 1.txt

数学计算，把第三段和第四段值相加，并赋予第七段

[root@localhost~]# awk -F ':' '{$7=$3+$4; print $0}' 1.txt

这样就改变了原来文本的结构，所有 print $0 的时候就不再有分隔符显示。如果想显示分隔符需要借助 OFS

[root@localhost~]# awk -F ':' '{OFS=":"} {$7=$3+$4; print $0}' 1.txt

计算第三段的总和

[root@localhost~]# awk -F ':' '{(tot=tot+$3)}; END {print tot}' 1.txt

awk中也可以使用 if 关键词

[root@localhost~]# awk -F ':' '{if ($1=="root") print $0}' 1.txt

目录CONTENT

正则表达式 grep egrep sed awk

正则表达式 grep egrep sed awk

grep

例子：

egrep

sed

awk

一般应用

awk 内置变量 NF（段数） NR（行数）

评论区