COMP2521 21T2 Past Exam — Theory Walkthrough Q1

How to use this page 本页使用说明

📖 Format说明

Each section covers one theory question from the 21T2 past exam. The pattern is: question → key concepts → my actual attempt → what I got wrong → corrected exam answer. The "bugs you hit" callouts capture the real mistakes — they're worth more than the right answers because they show why the right answer is right. 每一节对应 21T2 真题的一道理论题。结构：题目 → 核心概念 → 我的真实尝试 → 错在哪 → 标准考试答案。"踩过的坑"框记录了真正的错误 — 这些比正确答案还有价值，因为它们解释了为什么正确答案是对的。

💡 Strategy策略

Don't read passively. After each section, close the page and try writing the answer pattern from memory. Then take the quiz at the bottom (30 questions, click-to-reveal explanations). 不要光读。每节读完后关掉页面，凭记忆写出答题模板。最后做底部的 quiz（30 题，点击即看解析）。

Q1 Graph Representations · 4 marks 图的表示方法 · 4 分

Scenario: 10,000 vertices, only 10–20 edges ever, space is the only thing that matters. 场景：10,000 个顶点，但只有 10–20 条边，唯一关心的就是空间。

Three representations — the master table 三种表示法 — 必背表

Representation	Space	Edge exists?	Neighbours of u?	All edges
Adjacency Matrix	O(V²)	O(1) ✓	O(V)	O(V²)
Adjacency List	O(V + E)	O(degree)	O(degree) ✓	O(V + E)
Edge List	O(E)	O(E)	O(E)	O(E) ✓

✅ Exam answer (Q1) 考试答案 (Q1)

Edge list. With only 10–20 edges, an edge list uses ~20 entries. An adjacency matrix would waste 100,000,000 cells, and an adjacency list would still need 10,000 head pointers. Edge list is the most space-efficient, and the question states speed doesn't matter. Edge list（边表）。只有 10–20 条边时，边表只占约 20 项。邻接矩阵会浪费 1 亿个格子，邻接表也仍需要 10,000 个头指针。边表空间最省，而题目明说速度无所谓。

Decision tree — pick by operation, then sanity-check space 决策树 — 先看操作，再检查空间

"Is there an edge between u and v?" → Matrix wins (O(1))"u 和 v 之间有边吗？" → Matrix 胜 (O(1))
"Who are u's neighbours?" → Adjacency List wins (O(degree))"u 的邻居有谁？" → 邻接表胜 (O(degree))
"Process all edges (e.g. Kruskal)" → Edge List wins (O(E))"遍历所有边（比如 Kruskal）" → 边表胜 (O(E))
Then check: does the matrix fit in memory? If V is huge and graph is sparse → matrix is too big, use adjacency list anyway.然后再检查：matrix 装得下吗？如果 V 很大且图很稀疏 → matrix 太大，改用邻接表。

⚠️ Bugs you hit你踩过的坑

"Adjacency list space is O(V·E)" — wrong! It's O(V + E). Burn this in: V plus E, not V times E. "邻接表空间是 O(V·E)" — 错！是 O(V + E)。死记：V 加 E，不是 V 乘 E。
"V·E is better than V²" — backwards! For 1M users with 200 friends each, V² = 10¹², but V·E (if it really were V·E) would be 10¹⁴. Always plug in numbers. "V·E 比 V² 好" — 反了！比如 1M 用户每人 200 朋友，V² = 10¹²，V·E（假设真的是 V·E）会是 10¹⁴。永远代入数字算。
"Adjacency list is always O(1) for edge-exists" — no, that's matrix. Adjacency list is O(degree) for both edge-exists AND neighbour queries. "邻接表的 edge-exists 永远 O(1)" — 不对，那是 matrix。邻接表的 edge-exists 和 neighbours 查询都是 O(degree)。
"Kruskal goes through all edges → matrix" — opposite of correct! "Iterate all edges" is exactly what edge list is built for. With matrix you'd scan 10,000 cells to find 50 edges. "Kruskal 要遍历所有边 → matrix" — 完全相反！"遍历所有边"正是边表的强项。用 matrix 要扫 10,000 个格子才能找到 50 条边。

Practice trace — 4 scenarios, 4 different winners 练习回顾 — 4 种场景，4 种赢家

V, E	Operation	Winner	Why
1M, 100M	Edge exists	Adj list	Matrix needs 125 GB — won't fit in memory
50K, 120K	Neighbours	Adj list	Sparse + neighbour query is its sweet spot
500, dense	Edge exists	Matrix	Dense graph → no space penalty; O(1) lookup wins
100, 50	All edges	Edge list	Iterate all edges + very sparse

Q2 Hash Collisions · 4 marks 哈希冲突 · 4 分

Scenario: 1,000 slots, inserting 1,500 keys, no resizing. 场景：1,000 个槽，插入 1,500 个键，不扩容。

The two methods 两种方法

Method	Slot holds	Load factor α	Search
Separate Chaining 拉链法	A linked list	Can exceed 1	O(1 + α) avg
Open Addressing 开放地址	One key only	Must be < 1	Degrades fast above α ≈ 0.7

💡 The one-number rule 一个数字定生死

Compute α = N / M (keys ÷ slots), then: 算装载因子 α = N / M（键数 ÷ 槽数），然后：

α < 0.7 → Open addressing (fastest, cache-friendly)α < 0.7 → 开放地址（最快，缓存友好）
0.7 ≤ α < 1 → Separate chaining (open addressing degrades badly)0.7 ≤ α < 1 → 拉链法（开放地址退化严重）
α ≥ 1 → Separate chaining (open addressing is impossible)α ≥ 1 → 拉链法（开放地址根本不可能）

✅ Exam answer (Q2) 考试答案 (Q2)

Separate chaining. With 1,500 keys into 1,000 slots, α = 1.5 — open addressing is impossible because each slot holds only one key. Separate chaining handles this naturally: each slot stores a linked list, so a slot can hold multiple keys. Search remains O(1 + α) ≈ O(2.5) on average. 拉链法。1,500 键放进 1,000 槽，α = 1.5 — 开放地址不可能，因为每个槽只能放一个键。拉链法天然支持：每个槽存链表，可以放任意多个键。平均查找 O(1 + α) ≈ O(2.5)。

⚠️ Common trap常见陷阱

Don't reflexively pick chaining just because it's familiar. If α < 0.7 (e.g. 50 keys in 100 slots, α = 0.5), open addressing wins — it's faster because data is stored contiguously in an array (CPU cache hits), no pointer chasing. Chaining wastes time jumping between linked-list nodes scattered in memory. 不要看到哈希就反射式选拉链法。如果 α < 0.7（比如 50 键 / 100 槽，α = 0.5），开放地址法胜 — 数据连续存在数组里（CPU 缓存命中率高），不用走指针。拉链法要在散落内存里跳来跳去。

Q3 When to use Heap Sort · 3 marks 什么时候用 Heap Sort · 3 分

Three sorts side-by-side 三种排序对比

Sort	Best	Avg	Worst	Space	Stable
Quicksort	O(n log n)	O(n log n)	O(n²) ⚠️	O(log n)	❌
Mergesort	O(n log n)	O(n log n)	O(n log n)	O(n) ⚠️	✅
Heap Sort	O(n log n)	O(n log n)	O(n log n) ✓	O(1) ✓	❌

✅ Exam answer (Q3) 考试答案 (Q3)

Heap sort is ideal when both worst-case performance must be guaranteed AND memory is constrained. It's the only sort with both O(n log n) worst-case (unlike quicksort's O(n²)) AND O(1) extra space (unlike mergesort's O(n)). Typical use cases: embedded systems, OS kernels, real-time systems. Heap sort 在同时需要保证最坏情况性能 + 内存受限时最理想。它是唯一同时拥有最坏 O(n log n)（不像 quicksort 退化到 O(n²)）和 O(1) 额外空间（不像 mergesort 用 O(n)）的排序。典型场景：嵌入式系统、OS 内核、实时系统。

💡 "Which sort?" decision tree "用哪种排序"决策树

Data nearly sorted? → Insertion sort (O(n) on near-sorted data)数据几乎排好？→ 插入排序（几乎有序时 O(n)）
Need stability? → Mergesort (only stable O(n log n) sort)需要稳定？→ 归并排序（唯一稳定的 O(n log n)）
Memory tight + need worst-case guarantee? → Heap sort内存紧 + 要保证最坏情况？→ 堆排序
Average speed wanted, normal case? → Quicksort (smallest constants, cache-friendly)追求平均速度，普通场景？→ 快速排序（常数最小，缓存友好）
n < 10? → Insertion sort (smallest overhead)n < 10？→ 插入排序（开销最小）

⚠️ Bug you hit你踩过的坑

"Heap sort is really quick" — misleading. In raw runtime, quicksort is usually faster than heap sort (smaller constants, better cache locality). Heap sort's strength isn't speed — it's guaranteed O(n log n) + O(1) space. Don't write "fastest" in the exam; write "predictable worst-case" or "memory-efficient." "Heap sort 很快" — 误导。实际跑起来 quicksort 通常比 heap sort 快（常数小，缓存好）。heap sort 的优势不是速度，是最坏情况可预测 + O(1) 空间。考试别写"最快"，要写"最坏情况可预测"或"内存高效"。

Q4 Euler Path / Circuit · 3 marks 欧拉路径 / 回路 · 3 分

Goal: decide whether you can draw the graph in one continuous stroke (each edge used exactly once). 目标：判断能不能"一笔画"完整张图（每条边用且仅用一次）。

The whole rule fits in 4 lines 规则只要四行

Odd-degree vertex count	Result
0	✅ Euler circuit (and therefore Euler path)
2	✅ Euler path only — start at one odd vertex, end at the other
> 2	❌ Neither
(graph is disconnected)	❌ Neither (Euler theorem requires connectedness)

💡 Why this rule?为什么是这规则？

Each time you pass through a non-endpoint vertex, you use one edge in + one edge out = 2 edges. So middle vertices must have even degree. The only exceptions are start and end — they use one extra edge each (either out or in), so they can be odd. That's why "0 or 2 odd vertices." 每次经过中间顶点，用掉一条进 + 一条出 = 2 条边。所以中间顶点的度数必须是偶数。只有起点和终点是例外 — 它们各多用一条边（一条出或一条进），所以可以是奇数。这就是"0 或 2 个奇数度顶点"的来源。

3-step answer template 三步答题模板

List every vertex's degree. "deg(A) = 3, deg(B) = 4, ..." — this is the marking point.列出每个顶点的度数。"deg(A) = 3, deg(B) = 4, ..." — 这是给分点。
Count odd-degree vertices.数奇数度顶点的个数。
State conclusion + mention connectedness.下结论 + 说明图连通。

⚠️ Bugs you hit你踩过的坑

Method correct, but you didn't actually count each vertex's degree. The exam wants you to write them out — that's where the marks live.方法对了，但你没有真的逐个数顶点度数。考试要你把度数列出来 — 给分点就在这里。
Forgot to mention "the graph is connected." Euler theorem requires connectedness; skipping it can lose a mark.忘了提"图是连通的"。欧拉定理要求连通性；不写可能丢分。
"Euler path" includes the closed kind (Euler circuit). If all degrees even → answer is "yes, Euler circuit (which is a closed Euler path)" — don't write "no Euler path.""Euler path" 包括闭合的（即 Euler circuit）。如果所有度数都偶数 → 答 "有 Euler circuit（一种闭合的 Euler path）" — 不要写"没有 Euler path"。

Q5 Limitations of Recursion · 3 marks 递归的局限性 · 3 分

Expected length: 10–50 words. 期望长度：10–50 字。

✅ Exam answer (~35 words) 考试答案（约 35 字）

Each recursive call uses stack space. Deep recursion can cause a stack overflow, crashing the program. For example, factorial(1000000) works fine iteratively but overflows the stack recursively. 递归会消耗调用栈空间。递归过深会导致 stack overflow，程序崩溃。比如 factorial(1000000) 用循环没问题，用递归会爆栈。

One-line memory aid 一句话记忆

Recursion = trade stack space for code clarity. Stack space is finite → too deep → crash. 递归 = 用栈空间换代码简洁。栈空间有限 → 递归太深 → 崩。

⚠️ Wrong directions to avoid不要写错方向

"Recursion is slower than iteration" — not necessarily true. Tail-call optimization can make them equally fast."递归比迭代慢" — 不一定。尾递归优化后可以一样快。
"Recursion is less intuitive" — backwards! Recursion is usually more intuitive (e.g. tree traversal). That's its strength, not a limitation."递归不直观" — 反了！递归通常更直观（比如树遍历）。这是优点，不是局限。

Q6 Time Complexity (single variable) · 2 marks 时间复杂度（单变量） · 2 分

void processThings(int n) {
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < i; j++) {
            for (int k = 0; k < 2; k++) {
                printf("%d %d %d\n", i, j, k);  // printf is O(n) here
            }
        }
    }
}

3-step analysis 三步分析

Layer	Iterations	Note
i (outer)	n	i++ → linear
j (j < i)	0+1+...+(n-1) ≈ n²/2	Triangle, still O(n²)
k (k < 2)	2	Constant — collapses to O(1)
printf	—	O(n) per call (given)

Multiply everything: n × n²/2 × 1 × n = n³ → Best = Worst = O(n³). 全部相乘：n × n²/2 × 1 × n = n³ → 最好 = 最坏 = O(n³)。

No if branch, no early exit → behaviour depends only on n → best case = worst case. 没有 if 分支，没有提前退出 → 行为只由 n 决定 → 最好 = 最坏。

Loop-pattern cheat sheet 循环模式速查

Update	Iterations	Big-O
i++ or i += k	n / k	O(n)
i *= 2 or i /= 2	log₂ n	O(log n)
j < i (triangle)	0+1+...+n-1 ≈ n²/2	O(n²)
j < constant	constant	O(1)

Mnemonic: i changes by addition → O(n); i changes by multiplication → O(log n). 口诀：i 用加法更新 → O(n)；i 用乘除更新 → O(log n)。

⚠️ Bugs you hit on the practice problems 练习中你踩过的坑

Sequential vs nested: two loops one after another → add (O(n) + O(n²) = O(n²)). Two loops one inside another → multiply. Look at the brace position!先后 vs 嵌套：两个 for 一前一后 → 相加（O(n) + O(n²) = O(n²)）。一个套在另一个里 → 相乘。看大括号位置！
Don't write fractions: "O(n/2)" is wrong. Drop constants → O(n). Big-O never has fractions or constant factors.不要写分数："O(n/2)" 错。丢掉常数 → O(n)。Big-O 不带分数或常数因子。
i *= 2 ≠ i++: always check the third part of for. Multiplication → log; addition → linear.i *= 2 不是 i++：永远盯紧 for 的第三段。乘 → log；加 → 线性。
Always check the loop body's complexity! Q6 says printf is O(n), which adds a factor of n. Forgetting this → wrong answer.永远要看循环体里操作的复杂度！Q6 说 printf 是 O(n)，所以多乘一个 n。忘了就错。

Q7 Time Complexity (two variables) · 3 marks 时间复杂度（双变量） · 3 分

void processThings(int n, int m) {
    int *nums = malloc(sizeof(int) * n);          // O(1)

    for (int i = 0; i < n; i++)                   // segment 1: O(n)
        nums[i] = randInt(0, n);                   // O(1)

    for (int i = 0; i < m; i++)                   // segment 2: O(m)
        printf("%d\n", i);                         // O(1)

    insertionSort(nums, 0, n - 1);                // segment 3: best O(n), worst O(n²)
}

Sequential segments → add 先后段 → 相加

Best: O(1) + O(n) + O(m) + O(n) = O(n + m)
最好：O(1) + O(n) + O(m) + O(n) = O(n + m)

Worst: O(1) + O(n) + O(m) + O(n²) = O(n² + m)
最坏：O(1) + O(n) + O(m) + O(n²) = O(n² + m)

💡 Three rules for two-variable Big-O 双变量 Big-O 三规则

Same variable: drop lower order. n + n² → n²; m + m² → m².同变量：丢低阶。n + n² → n²；m + m² → m²。
Different variables: keep both — you don't know which dominates. n² + m stays as is.不同变量：都保留 — 不知道谁主导。n² + m 保持原样。
Sequential adds, nested multiplies. Always.先后相加，嵌套相乘。永远。

⚠️ The biggest bug you hit (twice!) 你踩过的最大的坑（两次！）

"Best case = O(1)" — wrong. The first two for-loops run unconditionally (no if, no return) — they're O(n) + O(m) no matter what. "Best case" doesn't mean "all O(1)"; it means "given the most favourable data, what's the runtime?""最好 = O(1)" — 错。前两个 for 循环无条件跑（没 if，没 return）— 它们不管怎样都是 O(n) + O(m)。"最好情况"不是"全 O(1)"，是"给最有利的数据时多快"。
"O(1) best case" only happens with early exit: some if + return / break that can fire on the first iteration. No early exit → best = worst (up to data-dependent functions inside)."O(1) 最好情况"只在提前退出时存在：某个 if + return / break 能在第一次迭代就触发。没有提前退出 → 最好 = 最坏（除了里面被数据影响的函数）。
Don't smuggle m into n: if the function takes (int n, int m), the answer must mention both. O(n²) when the answer is O(nm) is wrong.不要把 m 偷换成 n：函数签名 (int n, int m)，答案必须含两个变量。答 O(n²) 但其实是 O(nm) 就错。
n log n ≠ n + log n: n log n is multiplication (one term), n + log n is addition (simplifies to O(n)). Never split them.n log n ≠ n + log n：n log n 是乘法（一项），n + log n 是加法（简化为 O(n)）。永远不要拆开。

Sort complexities — must memorize 排序复杂度 — 必背

Sort	Best	Avg	Worst
Insertion / Bubble	O(n) ✓ near-sorted	O(n²)	O(n²)
Selection	O(n²)	O(n²)	O(n²)
Quicksort	O(n log n)	O(n log n)	O(n²) ⚠️
Mergesort / Heap	O(n log n)	O(n log n)	O(n log n)

Q8 Quicksort vs Mergesort · 5 marks Quicksort vs Mergesort · 5 分

Assume median-of-three quicksort. 假设 median-of-three 版本的 quicksort。

Side-by-side 对比

	Quicksort (median-of-3)	Mergesort
Best / Avg	O(n log n)	O(n log n)
Worst	O(n²) ⚠️ (rare in practice)	O(n log n) ✓
Space	O(log n) ✓ (in-place)	O(n) ⚠️ (auxiliary array)
Stable	❌ No	✅ Yes
Cache-friendly	✅ Yes (smaller constants)	❌ Less so

✅ Exam answer (~130 words) 考试答案（约 130 字）

Complexity: Quicksort (median-of-three) is O(n log n) on average and best case but O(n²) worst case (rare in practice). Mergesort is O(n log n) in all cases. 复杂度：Quicksort（median-of-three）平均和最好都是 O(n log n)，最坏理论上仍是 O(n²)（极少见）。Mergesort 所有情况都是 O(n log n)。

Space: Quicksort uses O(log n) stack space (in-place); mergesort needs O(n) auxiliary memory. 空间：Quicksort 用 O(log n) 栈空间（就地）；mergesort 需要 O(n) 辅助内存。

Stability: Quicksort is unstable; mergesort is stable. 稳定性：Quicksort 不稳定；mergesort 稳定。

Prefer quicksort when memory is limited or average speed matters most — smaller constants, cache-friendly. Prefer mergesort when (1) worst-case O(n log n) must be guaranteed (e.g. real-time systems), (2) stable sorting required (e.g. multi-field database sorts), or (3) sorting linked lists (mergesort doesn't need random access). 优先 quicksort：内存受限或追求平均速度（常数小，缓存友好）。优先 mergesort：(1) 必须保证最坏 O(n log n)（如实时系统），(2) 需要稳定排序（如多字段数据库排序），(3) 排序链表（mergesort 不需要随机访问）。

🔥 Two high-frequency exam follow-ups 两个高频追问

"Sorting a linked list — which?" → Mergesort. Quicksort needs random access for partitioning; linked lists only allow sequential access. Mergesort splits + merges via sequential traversal — natural fit. "链表排序选哪个？" → Mergesort。Quicksort 需要随机访问做 partition；链表只支持顺序访问。Mergesort 分 + 合都靠顺序遍历 — 天然合适。
"Why does stability matter?" → Multi-key sorting. Sort by name first, then by age — equal-age people must keep their name order. Only stable sorts preserve this. "稳定性为什么重要？" → 多字段排序。先按姓名排，再按年龄排 — 同年龄的人要保留姓名顺序。只有稳定排序能做到。

21T2 Theory Q1–Q8 Walkthrough 21T2 理论题 Q1–Q8 精讲