Representing Numbers & Characters Reliably 可靠地表示数值与字符

IEEE-754 arithmetic, rounding pitfalls, ASCII/Unicode evolution, and UTF encoding strategies 覆盖 IEEE-754 算术、舍入陷阱、ASCII/Unicode 演进与 UTF 编码策略

🎯 Learning Objectives学习目标

Decode IEEE-754 single/double precision bit patterns.解码 IEEE-754 单精度与双精度位模式。
Identify rounding errors, catastrophic cancellation, and special values (NaN, infinity).识别舍入误差、灾难性抵消以及特殊值（NaN、无穷）。
Compare text encodings (ASCII, Latin-1, UTF-8/16/32) and justify Unicode adoption.比较文本编码（ASCII、Latin-1、UTF-8/16/32），并阐述为何采用 Unicode。
Convert between Unicode code points and UTF-8 byte sequences.在 Unicode 码点与 UTF-8 字节序列之间转换。
Craft exam-ready explanations for floating point equality pitfalls and locale-sensitive text handling.编写考试级论述，解释浮点比较陷阱与文本区域性处理注意事项。

🧭 Exam Alignment考试对齐

IEEE decoding — 22T2 Q8 asks to convert bitstrings to decimal values.IEEE 解码 — 22T2 Q8 要求将比特串转换为十进制。
Equality traps — Tutorial10 emphasises why d == d + 1 may be true for large doubles.等值陷阱 — Tutorial10 讨论为何大数双精度可出现 d == d + 1。
UTF-8 mapping — Lab10 «utf8_encoder» mirrors final exam encoding tasks.UTF-8 转换 — Lab10 “utf8_encoder” 对应期末编码题。
Locale Q&A — Past finals question why ASCII cannot represent emoji or CJK characters.区域性问答 — 历届期末询问 ASCII 为什么不能表示表情或中日韩字符。
Cancellation reasoning — 20T3 Q7 includes catastrophic cancellation analysis.抵消分析 — 20T3 Q7 探讨灾难性抵消问题。

Coverage goals: decode ≥5 floating-point examples, document rounding-error mitigation strategies, and practice encoding ≥4 Unicode characters into UTF-8. 覆盖目标：解码 ≥5 个浮点示例，记录舍入误差缓解策略，并练习将 ≥4 个 Unicode 字符编码成 UTF-8。

📚 Core Concepts核心概念

IEEE-754 LayoutIEEE-754 布局

Floating point uses sign, exponent, mantissa. Single precision: 1/8/23 bits; double precision: 1/11/52 bits.浮点数由符号位、指数位、尾数位组成。单精度：1/8/23 位；双精度：1/11/52 位。

Rounding & Cancellation舍入与抵消

Limited precision forces rounding errors. Catastrophic cancellation occurs when subtracting near-equal numbers.精度有限导致舍入误差；相近数相减会引发灾难性抵消。

Special Values特殊值

Zero, denormalised numbers, infinities, NaNs — understand representation and comparisons (NaN != NaN).零、非正规数、无穷与 NaN 的表示及比较规则（如 NaN != NaN）。

From ASCII to Unicode从 ASCII 到 Unicode

ASCII covers 128 codes, extended ASCII adds local sets; Unicode aims to cover all characters with code points (U+0000 … U+10FFFF).ASCII 仅覆盖 128 个字符；扩展 ASCII 引入局部字符集；Unicode 通过码点（U+0000…U+10FFFF）覆盖全球文字。

UTF-8 EncodingUTF-8 编码

Variable-length encoding: 1–4 bytes. ASCII compatible; high bits signal continuation.UTF-8 是可变长编码（1–4 字节），兼容 ASCII，高位标志继续字节。

UTF-16 & SurrogatesUTF-16 与代理对

UTF-16 uses 16-bit code units; surrogate pairs encode characters beyond BMP. Endianness indicated via BOM.UTF-16 以 16 位单元表示，代理对编码超出 BMP 的字符，并通过 BOM 指定字节序。

🧪 Worked Examples示例串讲

Example 1 — Decode IEEE-754 Float示例 1 — 解码 IEEE-754 浮点数

Bit pattern: 0 10000000 11000000000000000000000 → (-1)^0 × 1.5 × 2^1 = 3.0.比特模式 0 10000000 11000000000000000000000 → (-1)^0 × 1.5 × 2^1 = 3.0。

Remember bias (127 for float) when reconstructing exponent; see ./week10-encoding-floating.html#ieee-format.重建指数时要记得偏移量（单精度为 127），详见 ./week10-encoding-floating.html#ieee-format。

Example 2 — Encode U+1F600 in UTF-8示例 2 — 编码 U+1F600 的 UTF-8 序列

Split bits → 11110000 10011111 10011000 10000000 (0xF0 0x9F 0x98 0x80). Follows 4-byte pattern; verifying ensures ability to handle emoji in exam tasks.拆分比特 → 11110000 10011111 10011000 10000000（0xF0 0x9F 0x98 0x80）。符合四字节模式，考试处理 emoji 时常用。

⚠️ Common Pitfalls易错点

Comparing doubles with ==/!= without tolerance.直接使用 ==/!= 比较 double，未设置容差。
Assuming decimal fractions like 0.1 are represented exactly; they are not in binary.误以为 0.1 等十进制小数在二进制中可精确表示。
Treating UTF-8 as fixed width; slicing mid-byte corrupts characters.错误认为 UTF-8 是定长编码，截断字节会破坏字符。
Forgetting BOM when writing UTF-16 files causing reverse byte order on some systems.写 UTF-16 文件时忘记 BOM，导致在部分系统上字节序反转。

🛠️ Practice Task实践任务

Create encode_lab.c: accept floating-point literals and Unicode code points, then output IEEE-754 bit patterns and UTF-8 byte sequences.编写 encode_lab.c：输入浮点字面量与 Unicode 码点，输出对应的 IEEE-754 位模式与 UTF-8 字节序列。

Use memcpy into uint64_t/uint32_t to inspect raw bits.通过 memcpy 将浮点值拷贝到 uint64_t/uint32_t 查看原始位。
Implement UTF-8 encoder based on lead byte templates (1–4 bytes).根据首字节模板实现 UTF-8 编码（1–4 字节）。
Optional Optional: add UTF-16LE encoder and ASCII fallback logic.可选 Optional：加入 UTF-16LE 编码与 ASCII 回退逻辑。

🧪 Tutorial & Lab Mapping教程与实验映射

Tutorial 10 HighlightsTutorial 10 精要

IEEE decoding drills and catastrophic cancellation discussions.IEEE 解码训练与灾难性抵消讨论。
Why d == d+1 can be true; exploring representable range.分析 d == d+1 可能成立的原因，理解可表示范围。
Manual UTF-8 encoding practice (BMP and supplemental planes).手动编码 UTF-8，包括 BMP 与增补平面字符。

Lab 10 Programming TasksLab 10 编程任务

float_bits.c — print bit patterns of float/double inputs.float_bits.c — 输出 float/double 的位模式。
float_accuracy.c — illustrate rounding/cancellation scenarios.float_accuracy.c — 演示舍入与抵消场景。
utf8_encoder.c — convert code points to UTF-8.utf8_encoder.c — 将码点编码为 UTF-8。
utf_sanitiser.c (challenge) — validate UTF-8 streams and report invalid sequences.utf_sanitiser.c（挑战） — 校验 UTF-8 流并报告非法序列。

📝 Study Log学习记录

Inputs shared: floating_point.pdf, unicode.pdf, Lab10 spec, Tutorial10 sheet, exam archives.提供资料：floating_point.pdf、unicode.pdf、Lab10 说明、Tutorial10 讲义、历年试题。
Prompt: “Combine floating-point decoding tables with UTF cheat-sheets for rapid revision.”提示词：“将浮点解码表与 UTF 速查表结合，便于快速复习。”
Breakthrough: Visualising exponent bias clarified why very large integers lose +1 precision.收获：可视化指数偏移帮助理解大整数加 1 仍不变的原因。
Misconception fixed: Previously thought UTF-8 uses BOM; now clarified only UTF-16/32 require BOM for endianness.修正误区：曾误认为 UTF-8 需要 BOM，已确认只有 UTF-16/32 需指示字节序。
Action items: Practise encoding random Unicode points, review decimal approximation techniques, and prepare cheat-sheet table.后续行动：练习编码随机 Unicode 码点，复习十进制近似技巧，并准备速查表。

Premium Quiz — 40 Questions on Floating Point & UnicodePremium 测验 — 40 道浮点与 Unicode 题

28 basic (IEEE/ASCII basics) · 8 intermediate (error analysis & encoding) · 4 advanced (combined scenarios)基础28题（IEEE/ASCII基础）· 中级8题（误差分析与编码）· 高级4题（综合场景）

🔒 Open Week 10 Quiz (Premium) 打开第 10 周测验（会员）

🔭 Next Steps后续重点

Week 11+ focuses on final consolidation: revisit high-weight quizzes, compile formula cheat-sheets, and simulate final exam timing.之后请集中整合：回顾高权重测验、整理公式速查表，并模拟期末考试时间。

📎 Resources & Checklist资源与检查表

PPT: floating_point.pdf p1–28, unicode.pdf p1–46.PPT：floating_point.pdf 第1–28页，unicode.pdf 第1–46页。
Autotest: 1521 autotest lab10_float_bits, lab10_float_accuracy, lab10_utf8_encoder, lab10_utf_sanitiser.自动测试：1521 autotest lab10_float_bits、lab10_float_accuracy、lab10_utf8_encoder、lab10_utf_sanitiser。
Self-check: Can you convert U+20AC to UTF-8 quickly? Can you explain why NaN breaks equality checks?自检：能否快速将 U+20AC 转换为 UTF-8？能否解释为何 NaN 会破坏等值比较？