PyTorch

PyTorch & XOR PracticePyTorch 与 XOR 练习

Tensor operations, small network structure, random initialisation, local minima, and practical tuning habits.整理 tensor 操作、小网络结构、随机初始化、局部最小值和基础调参习惯。

Tensor Basics

A tensor is the main data objectTensor 是主要数据对象

x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float)
print(x.shape)   # rows and columns
print(x.mean())  # average of all elements

Tensors are like numerical arrays, but they work naturally with automatic differentiation and accelerators.Tensor 像数值数组，但更适合自动求导和加速计算。

XOR Network

A small network can learn a nonlinear pattern小网络可以学习非线性模式

input:  2 values
hidden: 2 units with tanh
output: 1 value with sigmoid

The hidden layer gives the model enough flexibility to represent XOR, which one linear classifier cannot solve.隐藏层让模型有能力表示 XOR；单个线性分类器做不到。

Random Initialisation

Repeated runs can end differently重复运行可能得到不同结果

With random weights, a small model may sometimes reach a good solution and sometimes get stuck in a local minimum.由于权重随机初始化，小模型有时能到达好解，有时会卡在局部最小值。

Tuning Habit

Small initial weights plus momentum can stabilise training小初始化加动量可以让训练更稳定

Smaller initialisation: reduces early saturation in tanh units.减少 tanh 单元一开始就饱和的机会。
Momentum: adds inertia and helps updates move consistently.给更新方向加惯性，让训练更稳定。
Repeated trials: help distinguish stable settings from lucky single runs.多次运行可以区分稳定设置和偶然成功。

Training Loop

The standard PyTorch training rhythmPyTorch 训练循环的固定节奏

optimizer.zero_grad()
output = net(data)
loss = loss_function(output, target)
loss.backward()
optimizer.step()

zero_grad() clears old gradients. 清空旧梯度。
forward computes predictions. 计算预测。
backward() computes gradients. 计算梯度。
step() updates parameters. 更新参数。

Global vs Local Minimum

Loss can stop improving for different reasonsloss 停住可能有不同原因

A global minimum is the best loss the model can reach for the task. A local minimum is a point where small updates do not improve the loss, even though a better solution exists elsewhere.global minimum 是模型对这个任务能达到的最好 loss；local minimum 是附近怎么小幅更新都不变好，但远处还有更好解的位置。

When a small model succeeds only in some random runs, it is a sign that initialisation and optimiser settings matter.如果小模型只有部分随机运行成功，说明初始化和优化器设置很重要。

New Practice Prompt

Mini exercise for this page本页小练习

A model with tanh hidden units often gets stuck when initial weights are very large. What practical change would you try first?一个使用 tanh 隐藏层的模型在初始权重很大时经常卡住。你会优先尝试什么改动？

Answer: Use smaller initial weights so tanh units do not saturate immediately.使用更小的初始权重，避免 tanh 单元一开始就进入饱和区。