Back to COMP2521 返回 COMP2521
Week 1 - Analysis of Algorithms 第一周 - 算法分析

Binary Search 二分搜索 (Binary Search)

💡 Core Concept 💡 核心概念

Binary Search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until you've narrowed down the possible locations to just one. 二分搜索是一种在有序列表查找项目的算法。 它的原理是不断将可能包含目标值的列表部分一分为二(/2), 直到找到目标值或确定目标值不存在。

⚠️ Important Rules ⚠️ 重要规则

❌ Rule 1: Array MUST Be Sorted ❌ 规则 1:数组必须是排好序的

If the array is unsorted, looking at the middle element gives us NO information about whether the target is to the left or right. 如果数组只是乱序的,切中间无法判断在左边还是右边。

✅ Linear vs Binary Search ✅ 线性搜索 vs 二分搜索

  • Unsorted Array: Must use Linear Search (checking every element). Slow O(n). 乱序数组:只能用 线性搜索 (Linear Search,就是一个个傻找),最慢 O(n)。
  • Sorted Array: Can use Binary Search. Fast O(log n). 有序数组:才能用 二分搜索 (Binary Search),飞快 O(log n)。

🐢 Linear Search (The Baseline) 🐢 线性搜索 (基础对照)

This is the simplest search algorithm. It's often used as the "baseline" comparison ("The Bad Example" for large data). Unlike Binary Search, it does not require the array to be sorted. 这是最简单、最基础的搜索算法,也是我们用来做对比的"反面教材"(在数据量大时)。 它不需要数组是排好序的。

// Linear Search Implementation // Returns index if found, -1 if not int linearSearch(int arr[], int size, int val) { for (int i = 0; i < size; i++) { // Walk from start to end if (arr[i]==val) { // Check one by one return i; // Found it! } } return -1; // Not found }

💡 The Analogy 💡 生活类比

It's like looking for a book on a messy shelf. You have to pick up the first book, check it, then the second, and so on. If the book is at the end (or not there), you check every single one. 就像在一个乱序的书架上找书。 你只能从第一本开始,拿起来看看是不是,不是就看下一本,直到找到为止。

📊 Characteristics 📊 特点分析

  • No Sorting Needed: The biggest advantage. Works on any data. 不需要排序:这是它最大的优点,给什么数组都能搜。
  • Efficiency O(n): Slow for large data. If you have 1 million items, you might check 1 million times. 效率 O(n):数据大时很慢。最好情况 1 次,最坏情况 n 次(平均 n/2)。

⚔️ Linear vs Binary Search ⚔️ 线性搜索 vs 二分搜索

Feature特性 Linear Search线性搜索 Binary Search二分搜索
Pre-condition 前提条件 None (Unsorted OK) 无 (乱序也行) Must be Sorted 必须有序
Speed 速度 Slow O(n) 慢 O(n) Fast O(log n) 快 O(log n)
Implementation 实现难度 Simple 极简 Tricky (Boundaries) 稍难 (边界处理)
Scenario 适用场景 Small / Unsorted Data 数据少、无序数组 Large / Sorted Data 数据多、已排序数组

📊 Complexity & Growth Analysis 📊 复杂度与增长分析

Example 1: Squared Growth (n²) 例子 1:平方级增长 (n²)

Question: How much longer does an n² algorithm take if input is 5x larger? 问题:如果有两个循环嵌套(n² 的复杂度),数据量变成了原来的 5倍,时间会变成多少倍?

T(n) = n² T(5n) = (5n)² = 25n² = 25 * T(n)

Answer: 25 times slower. (5 * 5 = 25) 答案:时间会变慢 25倍。(5 * 5 = 25)

Example: If 10 items take 100s, 50 items will take 2500s. 原理解释:比如本来处理 10 个数据要 100 秒,现在处理 50 个数据(5倍),就要 2500 秒。

Example 2: Cubic Growth (n³) 例子 2:立方级增长 (n³)

Question: How much longer does an n³ algorithm take if input is 3x larger? 问题:如果有三个循环嵌套(n³ 的复杂度),数据量变成了原来的 3倍,时间会变成多少倍?

T(n) = n³ T(3n) = (3n)³ = 27n³ = 27 * T(n)

Answer: 27 times slower. (3 * 3 * 3 = 27) 答案:时间会变慢 27倍。(3 * 3 * 3 = 27)

Explanation: Cubic growth is explosive. A small increase in data causes a massive increase in time. 原理解释:三次方增长是非常恐怖的,数据稍微多一点点,时间就指数级爆炸。

✂️ Simplifying Big O ✂️ 简化 Big O 表达式

Example 3: 6n log(n) + 5n + 2 例子 3:6n log(n) + 5n + 2

  • Identify Dominant Term: n log(n) is the "Boss" (grows fastest). 找老大:这三项里,n log(n) 增长最快。
  • Ignore Constants: Drop the 6. 去倍数:前面的数字 6 不重要,把它删掉。
= O(n log n)

Example 4: 2n² + 4n + log(n) 例子 4:2n² + 4n + log(n)

  • Identify Dominant Term: n² (The absolute Boss compared to n or log n). 找老大:n² (平方,这是绝对的老大),比线性强多了。
  • Ignore Constants: Drop the 2. 去倍数:把前面的 2 删掉。
= O(n²)

🚀 Why Ignore Constants? (The "Musk" Analogy) 🚀 为什么我们要"忽略常数"?(马斯克比喻)

We care about trends, not details.
If Elon Musk (Net worth $200B) picks up $100, he is still in the "Billionaire" tier. Similarly, O(n) + 100 is still just O(n). The constant doesn't change the scale.
因为我们在乎的是 趋势,而不是细节。
这就好比比富:马斯克(身价 2000 亿)在地上捡了 100 块钱,这根本不影响他和普通人之间的数量级差距。

🏆 Big O Tiers 🏆 算法等级排行

😇
God Tier
O(1), O(log n)
Instant / Very Fast (Elon Musk)
😐
Mid Tier
O(n)
Ordinary Middle Class
💀
Poverty Tier
O(n²)
Too slow for Big Data

💻 Implementation (C) 💻 代码实现 (C)

The algorithm maintains a low and high index. In each step, calculate mid and compare with target. 算法维护 lowhigh 索引。 每一步计算 mid 并与目标值比较。

int binary_search(int arr[], int n, int target) { int low = 0; int high = n - 1; // Loop while the search range is valid while (low <= high) { // Find the middle index // Using + (high - low) / 2 prevents overflow for large indices int mid=low + (high - low) / 2; // Case 1: Found it! if (arr[mid]==target) { return mid; } // Case 2: Target is in the right half // (Because the value at mid is too small) if (arr[mid] < target) { low=mid + 1; } // Case 3: Target is in the left half // (Because the value at mid is too big) else { high=mid - 1; } } // Target not found return -1; }