PyTorch & XOR PracticePyTorch 与 XOR 练习
Tensor operations, small network structure, random initialisation, local minima, and practical tuning habits.整理 tensor 操作、小网络结构、随机初始化、局部最小值和基础调参习惯。
A tensor is the main data objectTensor 是主要数据对象
x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float)
print(x.shape) # rows and columns
print(x.mean()) # average of all elements
Tensors are like numerical arrays, but they work naturally with automatic differentiation and accelerators.Tensor 像数值数组,但更适合自动求导和加速计算。
A small network can learn a nonlinear pattern小网络可以学习非线性模式
input: 2 values
hidden: 2 units with tanh
output: 1 value with sigmoid
The hidden layer gives the model enough flexibility to represent XOR, which one linear classifier cannot solve.隐藏层让模型有能力表示 XOR;单个线性分类器做不到。
Repeated runs can end differently重复运行可能得到不同结果
With random weights, a small model may sometimes reach a good solution and sometimes get stuck in a local minimum.由于权重随机初始化,小模型有时能到达好解,有时会卡在局部最小值。
Small initial weights plus momentum can stabilise training小初始化加动量可以让训练更稳定
- Smaller initialisation: reduces early saturation in tanh units.减少 tanh 单元一开始就饱和的机会。
- Momentum: adds inertia and helps updates move consistently.给更新方向加惯性,让训练更稳定。
- Repeated trials: help distinguish stable settings from lucky single runs.多次运行可以区分稳定设置和偶然成功。
The standard PyTorch training rhythmPyTorch 训练循环的固定节奏
optimizer.zero_grad()
output = net(data)
loss = loss_function(output, target)
loss.backward()
optimizer.step()
zero_grad()clears old gradients. 清空旧梯度。forwardcomputes predictions. 计算预测。backward()computes gradients. 计算梯度。step()updates parameters. 更新参数。
Loss can stop improving for different reasonsloss 停住可能有不同原因
A global minimum is the best loss the model can reach for the task. A local minimum is a point where small updates do not improve the loss, even though a better solution exists elsewhere.global minimum 是模型对这个任务能达到的最好 loss;local minimum 是附近怎么小幅更新都不变好,但远处还有更好解的位置。
Mini exercise for this page本页小练习
A model with tanh hidden units often gets stuck when initial weights are very large. What practical change would you try first?一个使用 tanh 隐藏层的模型在初始权重很大时经常卡住。你会优先尝试什么改动?
Answer: Use smaller initial weights so tanh units do not saturate immediately.使用更小的初始权重,避免 tanh 单元一开始就进入饱和区。