Lab 01: Regular ExpressionsLab 01: 正则表达式

Master pattern matching with grep -E使用 grep -E 掌握模式匹配

What is a Regular Expression?什么是正则表达式?

A Regular Expression (regex) is a sequence of characters that defines a search pattern. It's like a super-powered "find" function that can match complex patterns, not just exact text.

正则表达式(regex)是定义搜索模式的字符序列。它就像一个超级"查找"功能,可以匹配复杂的模式,而不仅仅是精确的文本。

🔍 Simple Analogy - Normal search finds "cat", regex finds "any word starting with c, ending with t"

🔍 简单类比 - 普通搜索找"张三",正则搜索找"所有姓张的人"

grep -E: Extended Regexgrep -E: 扩展正则表达式

grep -E uses extended regular expressions, which gives us more powerful features without needing to escape special characters.

grep -E 使用扩展正则表达式,提供更强大的功能,无需转义特殊字符。

$ grep -E "pattern" file.txt
# Search using extended regex
$ grep -E "cat|dog" animals.txt
# Find lines with cat OR dog
$ grep -E "pattern" file.txt
# 使用扩展正则表达式搜索
$ grep -E "cat|dog" animals.txt
# 查找包含 cat 或 dog 的行

1. Character Classes: [abc]1. 字符组: [abc]

Square brackets match any one character from the set.

方括号匹配集合中的任意一个字符

Pattern模式 Meaning含义 Matches匹配 Doesn't Match不匹配
[aeiou] Any vowel任意元音 a, e, i, o, u b, c, d
[a-z] Any lowercase letter任意小写字母 a, b, c, ... z A, B, 1, 2
[A-Z] Any uppercase letter任意大写字母 A, B, C, ... Z a, b, 1, 2
[0-9] Any digit任意数字 0, 1, 2, ... 9 a, b, c

Example: Four Consecutive Vowels示例:四个连续元音

$ grep -E "[aeiou]{4}" dictionary.txt
# Matches: aqueous, queueing, beauish, ...

2. Negated Character Classes: [^abc]2. 取反字符组: [^abc]

With ^ as the first character, it matches any character NOT in the set.

^ 作为第一个字符时,它匹配不在集合中的任意字符

Pattern模式 Meaning含义 Matches匹配
[^aeiou] Any non-vowel (consonants, digits, etc.)任意非元音(辅音、数字等) b, c, d, 1, 2, !
[^ ] Any non-space character任意非空格字符 a, B, 1, !

Example: Vowels in Order, No Other Vowels示例:元音按顺序,无其他元音

$ grep -E "^[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*$" dictionary.txt
# Matches: abstemious, facetious, arsenious, ...

3. Quantifiers: * + ? {n}3. 量词: * + ? {n}

Quantifiers specify how many times the previous element can appear.

量词指定前面的元素可以出现多少次

Symbol符号 Meaning含义 Pattern模式 Matches匹配
* 0 or more times0次或多次 ab*c ac, abc, abbc, abbbc
+ 1 or more times1次或多次 ab+c abc, abbc, abbbc (not ac)
? 0 or 1 time (optional)0次或1次(可选) colou?r color, colour
{n} Exactly n times正好n次 [aeiou]{4} eeee, auio, euou
{n,m} Between n and m timesn到m次 a{2,4} aa, aaa, aaaa

⭐ Remember: * = "zero or more" (can be empty), + = "one or more" (must have at least one)

⭐ 记住: * = "零或多个"(可以是空的),+ = "一个或多个"(必须至少有一个)

4. Dot and Anchors: . ^ $4. 点号和锚点: . ^ $

Symbol符号 Meaning含义 Pattern模式 Matches匹配
. Any single character任意单个字符 c.t cat, cot, cut, c5t
.* Any characters, any times任意字符,任意次数 a.*e ae, apple, anyone
^ Start of line行首 ^Hello "Hello world"
$ End of line行尾 end$ "the end"

Example: Vowels in Order示例:元音按顺序

$ grep -E "a.*e.*i.*o.*u" dictionary.txt
# Matches: abstemious, facetious, adventitious, ...

💡 Tip: Use ^pattern$ to match the entire line, not just part of it!

💡 提示: 使用 ^pattern$ 来匹配整行,而不仅仅是部分!

5. Word Boundary: \b5. 单词边界: \b

\b matches the position between a word character and a non-word character (like spaces, punctuation, start/end of line).

\b 匹配单词字符和非单词字符(如空格、标点、行首/尾)之间的位置。

Pattern模式 Matches匹配 Doesn't Match不匹配
cat\b "cat", "cat.", "a cat" "category", "concat"
\bcat "cat", "a cat" "concat", "bobcat"
\bcat\b "cat" (whole word only) "category", "bobcat"

6. Alternation (OR): |6. 或运算: |

Use | to match either the pattern before or after it.

使用 | 匹配前面或后面的模式。

$ grep -E "cat|dog" pets.txt
# Matches lines with "cat" OR "dog"
$ grep -E "Member for [A-Za-z]+[- ][A-Za-z]+" parliament.txt
# Matches multi-word electorates (space OR hyphen)

7. Grouping: ( )7. 分组: ( )

Parentheses group patterns together, often used with ? for optional parts.

括号将模式组合在一起,通常与 ? 一起使用表示可选部分。

$ grep -E "ll( [A-Z]+)?: Member" parliament.txt
# Matches "Bell:" or "Steggall OAM:" (OAM is optional)

💡 Pattern: (pattern)? = "this part is optional"

💡 模式: (pattern)? = "这部分是可选的"

Lab 01 Complete SummaryLab 01 完整总结

Symbol符号 Meaning含义 Example示例
abcLiteral match字面匹配lmn → calmness
[abc]Any one of其中任意一个[aeiou] → vowel
[^abc]NOT any of不是其中任意一个[^aeiou] → consonant
.Any character任意字符a.c → abc, a1c
*0 or more0次或多次ab*c → ac, abc
+1 or more1次或多次ab+c → abc, abbc
?Optional (0 or 1)可选(0或1次)colou?r → color
{n}Exactly n正好n次[aeiou]{4}
\bWord boundary单词边界\bcat\b
^Line start行首^Hello
$Line end行尾end$
|OR或者cat|dog
( )Grouping分组(ab)+

💡 Common Mistakes to Avoid💡 常见错误

❌ Wrong: *abc* - * at the start has nothing to repeat!

❌ 错误: *abc* - * 在开头没有东西可以重复!

❌ Wrong: .* is too greedy - it matches too much!

❌ 错误: .* 太贪婪 - 它匹配太多了!

❌ Wrong: Forgetting ^ and $ when you need to match the entire line

❌ 错误: 当需要匹配整行时忘记 ^$

⚠️ Remember: Case matters! Use [AEIOUaeiou] for both cases, or use grep -i

⚠️ 记住: 大小写很重要!使用 [AEIOUaeiou] 匹配两种情况,或使用 grep -i