Lab 01: Regular Expressions

What is a Regular Expression?什么是正则表达式？

A Regular Expression (regex) is a sequence of characters that defines a search pattern. It's like a super-powered "find" function that can match complex patterns, not just exact text.

正则表达式（regex）是定义搜索模式的字符序列。它就像一个超级"查找"功能，可以匹配复杂的模式，而不仅仅是精确的文本。

🔍 Simple Analogy - Normal search finds "cat", regex finds "any word starting with c, ending with t"

🔍 简单类比 - 普通搜索找"张三"，正则搜索找"所有姓张的人"

grep -E: Extended Regexgrep -E: 扩展正则表达式

grep -E uses extended regular expressions, which gives us more powerful features without needing to escape special characters.

grep -E 使用扩展正则表达式，提供更强大的功能，无需转义特殊字符。

$ grep -E "pattern" file.txt
# Search using extended regex
$ grep -E "cat|dog" animals.txt
# Find lines with cat OR dog
$ grep -E "pattern" file.txt
# 使用扩展正则表达式搜索
$ grep -E "cat|dog" animals.txt
# 查找包含 cat 或 dog 的行

1. Character Classes: [abc]1. 字符组: [abc]

Square brackets match any one character from the set.

方括号匹配集合中的任意一个字符。

Pattern模式	Meaning含义	Matches匹配	Doesn't Match不匹配
`[aeiou]`	Any vowel任意元音	a, e, i, o, u	b, c, d
`[a-z]`	Any lowercase letter任意小写字母	a, b, c, ... z	A, B, 1, 2
`[A-Z]`	Any uppercase letter任意大写字母	A, B, C, ... Z	a, b, 1, 2
`[0-9]`	Any digit任意数字	0, 1, 2, ... 9	a, b, c

Example: Four Consecutive Vowels示例：四个连续元音

$ grep -E "[aeiou]{4}" dictionary.txt
# Matches: aqueous, queueing, beauish, ...

2. Negated Character Classes: [^abc]2. 取反字符组: [^abc]

With ^ as the first character, it matches any character NOT in the set.

当 ^ 作为第一个字符时，它匹配不在集合中的任意字符。

Pattern模式	Meaning含义	Matches匹配
`[^aeiou]`	Any non-vowel (consonants, digits, etc.)任意非元音（辅音、数字等）	b, c, d, 1, 2, !
`[^ ]`	Any non-space character任意非空格字符	a, B, 1, !

Example: Vowels in Order, No Other Vowels示例：元音按顺序，无其他元音

$ grep -E "^[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*$" dictionary.txt
# Matches: abstemious, facetious, arsenious, ...

3. Quantifiers: * + ? {n}3. 量词: * + ? {n}

Quantifiers specify how many times the previous element can appear.

量词指定前面的元素可以出现多少次。

Symbol符号	Meaning含义	Pattern模式	Matches匹配
`*`	0 or more times0次或多次	`ab*c`	ac, abc, abbc, abbbc
`+`	1 or more times1次或多次	`ab+c`	abc, abbc, abbbc (not ac)
`?`	0 or 1 time (optional)0次或1次（可选）	`colou?r`	color, colour
`{n}`	Exactly n times正好n次	`[aeiou]{4}`	eeee, auio, euou
`{n,m}`	Between n and m timesn到m次	`a{2,4}`	aa, aaa, aaaa

⭐ Remember: * = "zero or more" (can be empty), + = "one or more" (must have at least one)

⭐ 记住: * = "零或多个"（可以是空的），+ = "一个或多个"（必须至少有一个）

4. Dot and Anchors: . ^ $4. 点号和锚点: . ^ $

Symbol符号	Meaning含义	Pattern模式	Matches匹配
`.`	Any single character任意单个字符	`c.t`	cat, cot, cut, c5t
`.*`	Any characters, any times任意字符，任意次数	`a.*e`	ae, apple, anyone
`^`	Start of line行首	`^Hello`	"Hello world"
`$`	End of line行尾	`end$`	"the end"

Example: Vowels in Order示例：元音按顺序

$ grep -E "a.*e.*i.*o.*u" dictionary.txt
# Matches: abstemious, facetious, adventitious, ...

💡 Tip: Use ^pattern$ to match the entire line, not just part of it!

💡 提示: 使用 ^pattern$ 来匹配整行，而不仅仅是部分！

5. Word Boundary: \b5. 单词边界: \b

\b matches the position between a word character and a non-word character (like spaces, punctuation, start/end of line).

\b 匹配单词字符和非单词字符（如空格、标点、行首/尾）之间的位置。

Pattern模式	Matches匹配	Doesn't Match不匹配
`cat\b`	"cat", "cat.", "a cat"	"category", "concat"
`\bcat`	"cat", "a cat"	"concat", "bobcat"
`\bcat\b`	"cat" (whole word only)	"category", "bobcat"

6. Alternation (OR): |6. 或运算: |

Use | to match either the pattern before or after it.

使用 | 匹配前面或后面的模式。

$ grep -E "cat|dog" pets.txt
# Matches lines with "cat" OR "dog"
$ grep -E "Member for [A-Za-z]+[- ][A-Za-z]+" parliament.txt
# Matches multi-word electorates (space OR hyphen)

7. Grouping: ( )7. 分组: ( )

Parentheses group patterns together, often used with ? for optional parts.

括号将模式组合在一起，通常与 ? 一起使用表示可选部分。

$ grep -E "ll( [A-Z]+)?: Member" parliament.txt
# Matches "Bell:" or "Steggall OAM:" (OAM is optional)

💡 Pattern: (pattern)? = "this part is optional"

💡 模式: (pattern)? = "这部分是可选的"

Lab 01 Complete SummaryLab 01 完整总结

Symbol符号	Meaning含义	Example示例
`abc`	Literal match字面匹配	`lmn` → calmness
`[abc]`	Any one of其中任意一个	`[aeiou]` → vowel
`[^abc]`	NOT any of不是其中任意一个	`[^aeiou]` → consonant
`.`	Any character任意字符	`a.c` → abc, a1c
`*`	0 or more0次或多次	`ab*c` → ac, abc
`+`	1 or more1次或多次	`ab+c` → abc, abbc
`?`	Optional (0 or 1)可选（0或1次）	`colou?r` → color
`{n}`	Exactly n正好n次	`[aeiou]{4}`
`\b`	Word boundary单词边界	`\bcat\b`
`^`	Line start行首	`^Hello`
`$`	Line end行尾	`end$`
`\|`	OR或者	`cat\|dog`
`( )`	Grouping分组	`(ab)+`

💡 Common Mistakes to Avoid💡 常见错误

❌ Wrong: *abc* - * at the start has nothing to repeat!

❌ 错误: *abc* - * 在开头没有东西可以重复！

❌ Wrong: .* is too greedy - it matches too much!

❌ 错误: .* 太贪婪 - 它匹配太多了！

❌ Wrong: Forgetting ^ and $ when you need to match the entire line

❌ 错误: 当需要匹配整行时忘记 ^ 和 $

⚠️ Remember: Case matters! Use [AEIOUaeiou] for both cases, or use grep -i

⚠️ 记住: 大小写很重要！使用 [AEIOUaeiou] 匹配两种情况，或使用 grep -i

Lab 01: Regular ExpressionsLab 01: 正则表达式