Back to COMP1521 Overview 返回 COMP1521 总览
Week 10 · Floating Point & Text Encoding 第 10 周 · 浮点与文本编码

Representing Numbers & Characters Reliably 可靠地表示数值与字符

IEEE-754 arithmetic, rounding pitfalls, ASCII/Unicode evolution, and UTF encoding strategies 覆盖 IEEE-754 算术、舍入陷阱、ASCII/Unicode 演进与 UTF 编码策略

🎯 Learning Objectives学习目标

🧭 Exam Alignment考试对齐

Coverage goals: decode ≥5 floating-point examples, document rounding-error mitigation strategies, and practice encoding ≥4 Unicode characters into UTF-8. 覆盖目标:解码 ≥5 个浮点示例,记录舍入误差缓解策略,并练习将 ≥4 个 Unicode 字符编码成 UTF-8。

📚 Core Concepts核心概念

IEEE-754 LayoutIEEE-754 布局

Floating point uses sign, exponent, mantissa. Single precision: 1/8/23 bits; double precision: 1/11/52 bits.浮点数由符号位、指数位、尾数位组成。单精度:1/8/23 位;双精度:1/11/52 位。

Rounding & Cancellation舍入与抵消

Limited precision forces rounding errors. Catastrophic cancellation occurs when subtracting near-equal numbers.精度有限导致舍入误差;相近数相减会引发灾难性抵消。

Special Values特殊值

Zero, denormalised numbers, infinities, NaNs — understand representation and comparisons (NaN != NaN).零、非正规数、无穷与 NaN 的表示及比较规则(如 NaN != NaN)。

From ASCII to Unicode从 ASCII 到 Unicode

ASCII covers 128 codes, extended ASCII adds local sets; Unicode aims to cover all characters with code points (U+0000 … U+10FFFF).ASCII 仅覆盖 128 个字符;扩展 ASCII 引入局部字符集;Unicode 通过码点(U+0000…U+10FFFF)覆盖全球文字。

UTF-8 EncodingUTF-8 编码

Variable-length encoding: 1–4 bytes. ASCII compatible; high bits signal continuation.UTF-8 是可变长编码(1–4 字节),兼容 ASCII,高位标志继续字节。

UTF-16 & SurrogatesUTF-16 与代理对

UTF-16 uses 16-bit code units; surrogate pairs encode characters beyond BMP. Endianness indicated via BOM.UTF-16 以 16 位单元表示,代理对编码超出 BMP 的字符,并通过 BOM 指定字节序。

🧪 Worked Examples示例串讲

Example 1 — Decode IEEE-754 Float示例 1 — 解码 IEEE-754 浮点数

Bit pattern: 0 10000000 11000000000000000000000 → (-1)^0 × 1.5 × 2^1 = 3.0.比特模式 0 10000000 11000000000000000000000 → (-1)^0 × 1.5 × 2^1 = 3.0。

Remember bias (127 for float) when reconstructing exponent; see ./week10-encoding-floating.html#ieee-format.重建指数时要记得偏移量(单精度为 127),详见 ./week10-encoding-floating.html#ieee-format。

Example 2 — Encode U+1F600 in UTF-8示例 2 — 编码 U+1F600 的 UTF-8 序列

Split bits → 11110000 10011111 10011000 10000000 (0xF0 0x9F 0x98 0x80). Follows 4-byte pattern; verifying ensures ability to handle emoji in exam tasks.拆分比特 → 11110000 10011111 10011000 10000000(0xF0 0x9F 0x98 0x80)。符合四字节模式,考试处理 emoji 时常用。

⚠️ Common Pitfalls易错点

🛠️ Practice Task实践任务

Create encode_lab.c: accept floating-point literals and Unicode code points, then output IEEE-754 bit patterns and UTF-8 byte sequences.编写 encode_lab.c:输入浮点字面量与 Unicode 码点,输出对应的 IEEE-754 位模式与 UTF-8 字节序列。

🧪 Tutorial & Lab Mapping教程与实验映射

Tutorial 10 HighlightsTutorial 10 精要

  • IEEE decoding drills and catastrophic cancellation discussions.IEEE 解码训练与灾难性抵消讨论。
  • Why d == d+1 can be true; exploring representable range.分析 d == d+1 可能成立的原因,理解可表示范围。
  • Manual UTF-8 encoding practice (BMP and supplemental planes).手动编码 UTF-8,包括 BMP 与增补平面字符。

Lab 10 Programming TasksLab 10 编程任务

  • float_bits.c — print bit patterns of float/double inputs.float_bits.c — 输出 float/double 的位模式。
  • float_accuracy.c — illustrate rounding/cancellation scenarios.float_accuracy.c — 演示舍入与抵消场景。
  • utf8_encoder.c — convert code points to UTF-8.utf8_encoder.c — 将码点编码为 UTF-8。
  • utf_sanitiser.c (challenge) — validate UTF-8 streams and report invalid sequences.utf_sanitiser.c(挑战) — 校验 UTF-8 流并报告非法序列。

📝 Study Log学习记录

Premium Quiz — 40 Questions on Floating Point & UnicodePremium 测验 — 40 道浮点与 Unicode 题

28 basic (IEEE/ASCII basics) · 8 intermediate (error analysis & encoding) · 4 advanced (combined scenarios)基础28题(IEEE/ASCII基础)· 中级8题(误差分析与编码)· 高级4题(综合场景)

🔒 Open Week 10 Quiz (Premium) 打开第 10 周测验(会员)

🔭 Next Steps后续重点

Week 11+ focuses on final consolidation: revisit high-weight quizzes, compile formula cheat-sheets, and simulate final exam timing.之后请集中整合:回顾高权重测验、整理公式速查表,并模拟期末考试时间。

📎 Resources & Checklist资源与检查表