COMP2041 Final Review — 24T1 Practice (Q1

Exam Format & How To Use This Page 考试格式 & 如何使用本页

📋 The Exam考试结构

12 questions, 100 marks, 3 hours working + 10 min reading. Questions divide into practical (Q1–Q5) and theory/longer scripts (Q6–Q12). This review covers Q1–Q4 of the 24T1 practice exam — the foundations every later question builds on. 12 题，100 分，3 小时答题 + 10 分钟阅读。分为 实操题 (Q1–Q5) 和 偏理论 / 较长脚本题 (Q6–Q12)。本页覆盖 24T1 模拟题的 Q1–Q4 — 这是后面所有题目共用的基础。

💡 Strategy策略

Each section is a knowledge dump + worked example + your real bugs. Don't just read — close the page and try writing each pattern from memory. Then take the quiz at the bottom (30 questions, click-to-reveal explanations). 每一节都是知识点 + 例题 + 你真正踩过的坑。不要只读 — 关掉页面，凭记忆写出每个模板。最后做底部 quiz（30 题，点击即看解析）。

Q1 Regex with grep -E grep -E 正则表达式

The data: pipe-separated awards 数据格式：竖线分隔的获奖记录

Each line of awards.psv has 6 fields separated by |: 每行 awards.psv 用 | 分成 6 个字段：

Award Name | Year | Winner Name | Gender | Country | Birth Year

ACM Turing Award|2000|Andrew Chi-Chih Yao|Male|China|1946
Nobel Prize for medicine|1963|Andrew F. Huxley|Male|United Kingdom|1917
Fields Medal|1982|William Thurston|Male|United States|1946

Why `-E`: bare | vs \| 为什么必须 `-E`：裸 | vs \|

In extended regex (grep -E), bare | means OR (alternation). To match a literal pipe, you must escape it: \|. This is the single most common bug in Q1. 在扩展正则 (grep -E) 里，裸 | 表示 OR（或）。要匹配真正的竖线必须转义：\|。这是 Q1 最常犯的错误。

grep -E 'A|B'    →  match lines containing 'A' OR 'B'
grep -E 'A\|B'   →  match lines containing the literal text 'A|B'

Anchors: ^, $, and field boundaries 锚点：^, $ 和字段边界

^ = start of line · $ = end of line. To pin down a middle field, surround it with literal pipes: \|Australia\|. ^ = 行首 · $ = 行尾。要锁死中间某个字段，用左右两边的竖线把它夹住：\|Australia\|。

Q1.1 worked example: Australian winners Q1.1 例题：澳大利亚获奖者

grep -E '\|Australia\|' awards.psv

Pipes on both sides ⇒ "Australia" must be field 5, not a substring inside a name or award title. 两边都有竖线 ⇒ "Australia" 必须是第 5 字段，不会误中名字或奖项里出现的子串。

AND across two fields: use `.` 两个字段同时满足：用 `.`

"Fields Medal winners from France" needs two conditions on the same line: field 1 = Fields Medal AND field 5 = France. Anchor each, join with .*: "法国的 Fields 奖得主" 需要同一行同时满足两个条件：字段 1 = Fields Medal 且字段 5 = France。两边各自锚定，中间用 .* 连接：

grep -E '^Fields Medal\|.*\|France\|' awards.psv

⚠️ Bug you hit你踩过的坑

Quantifiers and characters 量词与字符

Symbol	Meaning	Example
`.`	any single character	`a.c` matches `abc`, `a c`
`*`	0 or more of previous	`ab*c` matches `ac`, `abbbc`
`+`	1 or more of previous	`ab+c` matches `abc`, not `ac`
`?`	0 or 1 of previous	`colou?r` matches both spellings
`{n}`	exactly n times	`[0-9]{4}` = a 4-digit number
`[A-Z]`	character class (set)	any uppercase letter
`.*`	0+ of anything (workhorse)	"skip past whatever"
`\.`	literal dot	matches the period in "F."

⚠️ Bugs you hit (character classes) 你踩过的坑（字符类）

[194] ≠ 194. Brackets mean "one of these characters" — [194] matches a single 1, 9, OR 4. To match the literal sequence "194" (e.g. for the 1940s), write 194 with no brackets. [194] ≠ 194。方括号是"这些字符里的一个" — [194] 匹配单个 1、9 或 4。要匹配字面序列 "194"（例如 1940 年代）应该直接写 194。
[a+z] ≠ [a-z]. Inside brackets, + is a literal plus, not a quantifier. Use a hyphen for ranges. [a+z] ≠ [a-z]。方括号内 + 是字面加号，不是量词。范围用连字符。
[A-Z*] ≠ [A-Z]*. Inside brackets, * is literal. Outside, it's "0 or more of the previous thing." [A-Z*] ≠ [A-Z]*。方括号内 * 是字面量。在外面才是"前一个东西出现 0 次或多次"。
. alone matches exactly one character. To allow many, write .* or .+. . 单独使用只匹配恰好一个字符。要任意多个写 .* 或 .+。

Capture groups & backreferences 捕获组与反向引用

Wrap a part with (...) to capture it. Refer back with \1, \2, etc. — group numbers count left-to-right. 用 (...) 把一部分捕获起来。用 \1、\2 等反向引用 — 编号按左括号从左到右数。

✅ Q1.4: first name + middle initial + last name all start with the same letter Q1.4：名 + 中间名首字母 + 姓都用同一个字母开头

\|([A-Z])[a-z]+ \1\. \1[a-z]+\|

Read it left-to-right: pipe, capture an uppercase letter, lowercase rest, space, same letter + literal dot + space, same letter starting the surname, lowercase rest, pipe. 从左往右读：竖线，抓一个大写字母，剩下的小写，空格，同一个字母 + 字面点号 + 空格，同一个字母开头的姓，小写剩余，竖线。

Optional middle name (or no middle name) 中间名可有可无

Best pattern: a character class with * (zero or more). Cleaner than .* because it can't accidentally match pipes. 最佳写法：字符类 + *（零次或多次）。比 .* 更安全 — 不会意外吃掉竖线。

# Same first/last initial, optional middle:
grep -E '\|([A-Z])[a-z]+ [a-zA-Z. ]*\1[a-z]+\|' awards.psv

Backreferences DON'T work inside [...] 反向引用不能放在 [...] 里

[\1\2] doesn't mean "match group 1 OR group 2." Inside brackets, \1 is a literal backslash-1. To say "match group 1 OR group 2," use alternation outside brackets: (\1|\2). [\1\2] 不会被解读成"匹配组 1 或组 2"。在方括号里 \1 是字面字符。要表达"组 1 或组 2"用括号外的 alternation：(\1|\2)。

4-digit palindrome year (ABBA) 4 位回文年份 (ABBA)

# 1881, 1991, 2002 ...
grep -E '\|([0-9])([0-9])\2\1$' awards.psv

Group 1 = first digit. Group 2 = second digit. Then \2\1 mirrors them. $ anchors the year to the end of line because birth year is the last field. 组 1 = 第 1 位数字，组 2 = 第 2 位数字。然后 \2\1 镜像。$ 锁住行尾因为出生年是最后一个字段。

Q2 Shell Pipelines Shell 管道

The philosophy 管道哲学

A pipeline is a chain of small programs, each doing one thing, connected by |. Output of one becomes input of the next. Don't write the whole thing at once — build it stage by stage and check the output after each stage. 管道 = 一串各司其职的小工具，用 | 串起来，前一个的输出 = 后一个的输入。不要一次写完整条管道 — 分阶段建造，每加一段就检查一次输出。

⚠️ The three meanings of | ⚠️ `|` 的三种含义

grep -E '\|M$' | cut -d'|' -f3
        ↑   ↑   ↑      ↑
        |   |   |      └─ INSIDE quotes: cut's literal delimiter
        |   |   └─ OUTSIDE quotes: SHELL pipe (connects commands)
        |   └─ INSIDE quotes: end-of-line anchor (from grep regex)
        └─ INSIDE quotes: ESCAPED literal pipe (regex)

Same character, three jobs depending on context (quotes + position). This trips up everyone — once you see it, you can't unsee it. 同一个字符，根据上下文（引号、位置）扮演三种角色。第一次看会懵，看懂了就再也不会忘。

Tools you'll chain 常用工具

Tool	Job	Common flag
`grep`	filter lines by pattern	`-E` ERE · `-v` invert · `-i` ignore case · `-c` count
`cut`	extract columns	`-d 'X' -f N` (delimiter, field — both required!)
`sort`	order lines	`-n` numeric · `-u` unique · `-r` reverse
`uniq`	collapse adjacent duplicates	`-d` only dups · `-u` only uniques · `-c` with counts
`wc`	count	`-l` lines
`head` / `tail`	first / last N lines	`-n N`
`seq`	integers from n to m	e.g. `seq 1 5`
`tr`	translate / delete chars	`-d` delete · `-s` squeeze
`sed`	substitute	`'s/old/new/g'`

The Q2 problem Q2 题目

COMP1511|3360379|Costner, Kevin Augustus    |3978/1|M
COMP1511|3364562|Carey, Mary                |3711/1|F
COMP3311|3383025|Thorpe, Ian Augustus       |3978/3|M
...

✅ The 4-stage pattern四阶段模板

#! /bin/dash
grep -E '\|M$' | cut -d'|' -f3 | cut -d',' -f1 | sort -u
#  filter male  extract name    extract surname  sort + dedupe

This is the master template for "filter → extract → dedupe → (count)" questions. Memorise it. 这是"过滤 → 提取 → 去重 → (计数)"类题目的母模板。背下来。

Counting variant: `wc -l` 计数变体：`wc -l`

# How many distinct students enrolled in COMP3331?
grep -E '^COMP3331\|' enrolments.txt | cut -d'|' -f2 | sort -u | wc -l

Always dedupe by the uniquely-identifying field (student ID), not by name — two different students can share a name. Two "Wang, Wei" with different IDs are two people. 务必用唯一标识字段（学号）去重，不是名字 — 两个不同的学生可以同名。两个不同 ID 的 "Wang, Wei" 是两个人。

⚠️ Bugs you hit你踩过的坑

cut '|' -f3 — missing -d. Without it, cut defaults to TAB and your whole line becomes one field. Always: cut -d'|' -f3. cut '|' -f3 — 缺 -d。没有它，cut 默认用 TAB 分隔，整行变成一个字段。永远写 cut -d'|' -f3。
Plain sort on numbers gives 1, 10, 2, 45 (alphabetical). Use sort -n for numeric order. 数字直接 sort 会得到 1、10、2、45（按字母）。要数字顺序用 sort -n。
grep 'COMP3331' works but is loose — it would also match a hypothetical course "COMP33119." Defensive form: grep -E '^COMP3331\|'. grep 'COMP3331' 能用但宽松 — 也会命中假设的 "COMP33119"。更稳的写法：grep -E '^COMP3331\|'。

Q3 Python (same as Q2, different language) Python（和 Q2 同题，换语言）

stdin in Python Python 读 stdin

sys.stdin is iterable — loop through it line by line. The program doesn't know whether the data comes from the keyboard, a file (< file.txt), or another command (cmd | ./script.py). That's the point. sys.stdin 可迭代 — 用 for 循环逐行读取。程序不知道数据是从键盘、文件 (< file.txt) 还是另一个命令 (cmd | ./script.py) 来的 — 这正是它好用的地方。

The Q3 solution Q3 标准答案

#!/usr/bin/python3
import sys

surnames = set()

for line in sys.stdin:
    line = line.rstrip('\n')
    fields = line.split('|')

    if fields[4] != 'M':
        continue

    name = fields[2]
    surname = name.split(',')[0]
    surnames.add(surname)

for surname in sorted(surnames):
    print(surname)

Why each line matters 每行为什么重要

line.rstrip('\n') — when you read a line, Python keeps the trailing \n. Without stripping, the last field becomes 'M\n' and == 'M' fails for every line. line.rstrip('\n') — Python 读行会保留末尾的 \n。不去掉的话最后字段是 'M\n'，== 'M' 永远不成立。
fields[4] for "field 5" — Python lists are 0-indexed. Always subtract 1 from the spec field number. Or use fields[-1] for "the last field" (more robust). fields[4] 对应"第 5 字段" — Python 列表是 0 索引。永远比题目编号减 1。或者用 fields[-1] 表示"最后一个字段"（更稳）。
if ... != 'M': continue — early-exit filter. continue skips the rest of this iteration, jumps back to the top of the loop. Same as grep -E '\|M$' in shell. if ... != 'M': continue — 早退出过滤。continue 跳过本次循环剩余代码，回到 for 顶部。等价于 shell 里的 grep -E '\|M$'。
set() + .add() — auto-dedupe. Equivalent to sort -u in shell. set() + .add() — 自动去重。等价于 shell 的 sort -u。
sorted(surnames) — sets are unordered. You must wrap with sorted() or output is in arbitrary order. sorted(surnames) — set 无序，必须用 sorted() 包一下，否则输出顺序乱。

Shell ↔ Python translation table Shell ↔ Python 对照表

Shell	Python
`grep PATTERN`	`if PATTERN not in line: continue`
`grep -v PATTERN`	`if PATTERN in line: continue`
`cut -d'X' -f3`	`line.split('X')[2]`
`sort -u`	`set()` then `sorted(s)`
`wc -l`	`len(items)`
`$1`	`sys.argv[1]`
`cat < file`	`for line in sys.stdin:`

⚠️ Top-5 Python bugs you hit 你踩过的 5 大 Python 坑

Missing colon: for line in sys.stdin → must be for line in sys.stdin: 缺冒号：for line in sys.stdin → 必须 for line in sys.stdin:
= vs ==: if x = 5 is wrong; if x == 5 is comparison. = 与 ==：if x = 5 错；if x == 5 才是比较。
Unquoted string: if fields[4] == F — Python looks up a variable named F, NameError. Write 'F'. 字符串没引号：if fields[4] == F — Python 把 F 当变量找，报 NameError。要写 'F'。
print x is Python 2. Python 3 needs parens: print(x). print x 是 Python 2 的写法。Python 3 必须加括号：print(x)。
Off-by-one: spec says "field 3" → Python uses fields[2]. 差一错误：题目说"第 3 字段" → Python 写 fields[2]。

Run it 运行方式

chmod +x practice_q3.py
./practice_q3.py < enrolments.txt        # redirect file as stdin
cat enrolments.txt | ./practice_q3.py     # pipe in
./practice_q3.py                          # type lines + Ctrl+D

Q4 Shell Script — Find the Missing Integer Shell 脚本 — 找出缺失的整数

The problem 题目

A file contains an unordered list of positive integers from n to m, with possibly one missing. Print the missing integer, or nothing if none missing. 文件里有一组从 n 到 m 的正整数（顺序打乱），可能缺一个。输出缺的那个；如果没缺，什么都不输出。

✅ Final answer (4 lines) 最终答案（4 行）

#!/bin/dash

n=$(sort -n "$1" | head -n 1)
m=$(sort -n "$1" | tail -n 1)

( sort -n "$1" ; seq "$n" "$m" ) | sort -n | uniq -u

Concept 1: `$1` is a filename, not the data 概念 1：`$1` 是文件名，不是数据

When you run ./practice_q4.sh numbers_1.txt, $1 = the string "numbers_1.txt". To get the actual numbers inside, you must use a tool that reads the file: cat "$1", sort "$1", etc. Always quote: "$1" in case the filename has spaces. 运行 ./practice_q4.sh numbers_1.txt 时，$1 = 字符串 "numbers_1.txt"。要拿到里面的数字，必须用 cat "$1"、sort "$1" 等读文件的工具。永远加引号 "$1"，防止文件名有空格。

Concept 2: `$()` captures output 概念 2：`$()` 捕获输出

Without $(), a command's output goes to the screen and is gone. With $(), you catch it as a string into a variable. 不用 $()，命令的输出会打到屏幕上然后消失。用 $() 把输出抓成字符串存到变量里。

sort -n "$1" | head -n 1          # → prints to screen, lost
n=$(sort -n "$1" | head -n 1)     # → captured into n="39"

⚠️ Variable assignment rule 变量赋值规则

No spaces around =! n=42 works. n = 42 fails (shell thinks n is a command). = 两边不能有空格！n=42 对，n = 42 错（shell 会把 n 当命令）。

Concept 3: `-n` means different things 概念 3：`-n` 在不同命令含义不同

Command	`-n` means
`sort -n`	numeric sort (not alphabetical)
`head -n 5`	show this many lines
`tail -n 1`	show this many lines

Concept 4: the clever trick — `sort | uniq -u` 概念 4：核心技巧 — `sort | uniq -u`

Combine the actual numbers (from file) with the expected complete sequence (from seq). Numbers in both lists appear twice. The missing number appears once. uniq -u prints lines that appear exactly once. 把实际数字（来自文件）和完整期望序列（来自 seq）合并。两边都有的数字出现 2 次，缺的那个只出现 1 次。uniq -u 只打印恰好出现 1 次的行。

# Walk through with file = 39 45 40 44 41 43 (n=39, m=45):
sort -n "$1"     →  39 40 41    43 44 45      # 42 missing!
seq 39 45        →  39 40 41 42 43 44 45      # complete

( sort -n "$1" ; seq "$n" "$m" )  →  combined stream
| sort -n        →  39 39 40 40 41 41 42 43 43 44 44 45 45
| uniq -u        →  42                         ✓

Concept 5: subshell grouping `( cmd1 ; cmd2 )` 概念 5：子 shell 分组 `( cmd1 ; cmd2 )`

Parentheses run two (or more) commands and merge their outputs into one stream, which can then be piped. Semicolons separate commands within the group. 小括号把两条（或多条）命令的输出合并成一条流，再用管道送给下一个命令。分号分隔组内的命令。

💡 Why this is elegant 为什么这个写法优雅

If nothing is missing, every number appears twice → uniq -u outputs nothing. Spec satisfied with no special-case code. 如果什么都没缺，每个数字都出现 2 次 → uniq -u 不输出任何东西。完全符合题目要求，不用写特判分支。

Build it incrementally 分步建造

Step 1: cat "$1" — confirm $1 plumbing works.步骤 1：cat "$1" — 确认 $1 通了。
Step 2: capture min/max with $(), echo to confirm.步骤 2：用 $() 抓 min/max，echo 出来确认。
Step 3: seq "$n" "$m" on its own.步骤 3：单独跑 seq "$n" "$m"。
Step 4: combine + sort + uniq -u.步骤 4：合并 + sort + uniq -u。

COMP2041 Final Review — 24T1 Practice COMP2041 期末复习 — 24T1 模拟题

Exam Format & How To Use This Page 考试格式 & 如何使用本页

Q1 Regex with grep -E grep -E 正则表达式

The data: pipe-separated awards 数据格式：竖线分隔的获奖记录

Why `-E`: bare | vs \| 为什么必须 `-E`：裸 | vs \|

Anchors: ^, $, and field boundaries 锚点：^, $ 和字段边界

Q1.1 worked example: Australian winners Q1.1 例题：澳大利亚获奖者

AND across two fields: use `.` 两个字段同时满足：用 `.`

Quantifiers and characters 量词与字符

Capture groups & backreferences 捕获组与反向引用

Optional middle name (or no middle name) 中间名可有可无

Backreferences DON'T work inside [...] 反向引用不能放在 [...] 里

4-digit palindrome year (ABBA) 4 位回文年份 (ABBA)

Q2 Shell Pipelines Shell 管道

The philosophy 管道哲学

⚠️ The three meanings of | ⚠️ `|` 的三种含义

Tools you'll chain 常用工具

The Q2 problem Q2 题目

Counting variant: `wc -l` 计数变体：`wc -l`

Q3 Python (same as Q2, different language) Python（和 Q2 同题，换语言）

stdin in Python Python 读 stdin

The Q3 solution Q3 标准答案

Why each line matters 每行为什么重要

Shell ↔ Python translation table Shell ↔ Python 对照表

Run it 运行方式

Q4 Shell Script — Find the Missing Integer Shell 脚本 — 找出缺失的整数

The problem 题目

Concept 1: `$1` is a filename, not the data 概念 1：`$1` 是文件名，不是数据

Concept 2: `$()` captures output 概念 2：`$()` 捕获输出

Concept 3: `-n` means different things 概念 3：`-n` 在不同命令含义不同

Concept 4: the clever trick — `sort | uniq -u` 概念 4：核心技巧 — `sort | uniq -u`

Concept 5: subshell grouping `( cmd1 ; cmd2 )` 概念 5：子 shell 分组 `( cmd1 ; cmd2 )`

Build it incrementally 分步建造

📝 Test Yourself 📝 自我测试

Exam Format & How To Use This Page 考试格式 & 如何使用本页

Q1 Regex with grep -E grep -E 正则表达式

The data: pipe-separated awards 数据格式：竖线分隔的获奖记录

Why -E: bare | vs \| 为什么必须 -E：裸 | vs \|

Anchors: ^, $, and field boundaries 锚点：^, $ 和字段边界

Q1.1 worked example: Australian winners Q1.1 例题：澳大利亚获奖者

AND across two fields: use .* 两个字段同时满足：用 .*

Quantifiers and characters 量词与字符

Capture groups & backreferences 捕获组与反向引用

Optional middle name (or no middle name) 中间名可有可无

Backreferences DON'T work inside [...] 反向引用不能放在 [...] 里

4-digit palindrome year (ABBA) 4 位回文年份 (ABBA)

Q2 Shell Pipelines Shell 管道

The philosophy 管道哲学

⚠️ The three meanings of | ⚠️ | 的三种含义

Tools you'll chain 常用工具

The Q2 problem Q2 题目

Counting variant: wc -l 计数变体：wc -l

Q3 Python (same as Q2, different language) Python（和 Q2 同题，换语言）

stdin in Python Python 读 stdin

The Q3 solution Q3 标准答案

Why each line matters 每行为什么重要

Shell ↔ Python translation table Shell ↔ Python 对照表

Run it 运行方式

Q4 Shell Script — Find the Missing Integer Shell 脚本 — 找出缺失的整数

The problem 题目

Concept 1: $1 is a filename, not the data 概念 1：$1 是文件名，不是数据

Concept 2: $() captures output 概念 2：$() 捕获输出

Concept 3: -n means different things 概念 3：-n 在不同命令含义不同

Concept 4: the clever trick — sort | uniq -u 概念 4：核心技巧 — sort | uniq -u

Concept 5: subshell grouping ( cmd1 ; cmd2 ) 概念 5：子 shell 分组 ( cmd1 ; cmd2 )

Build it incrementally 分步建造

📝 Test Yourself 📝 自我测试

Why `-E`: bare | vs \| 为什么必须 `-E`：裸 | vs \|

AND across two fields: use `.` 两个字段同时满足：用 `.`

⚠️ The three meanings of | ⚠️ `|` 的三种含义

Counting variant: `wc -l` 计数变体：`wc -l`

Concept 1: `$1` is a filename, not the data 概念 1：`$1` 是文件名，不是数据

Concept 2: `$()` captures output 概念 2：`$()` 捕获输出

Concept 3: `-n` means different things 概念 3：`-n` 在不同命令含义不同

Concept 4: the clever trick — `sort | uniq -u` 概念 4：核心技巧 — `sort | uniq -u`

Concept 5: subshell grouping `( cmd1 ; cmd2 )` 概念 5：子 shell 分组 `( cmd1 ; cmd2 )`