← Deep Learning← 深度学习知识地图
PyTorch

PyTorch & XOR PracticePyTorch 与 XOR 练习

Tensor operations, small network structure, random initialisation, local minima, and practical tuning habits.整理 tensor 操作、小网络结构、随机初始化、局部最小值和基础调参习惯。

Tensor Basics

A tensor is the main data objectTensor 是主要数据对象

x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float)
print(x.shape)   # rows and columns
print(x.mean())  # average of all elements

Tensors are like numerical arrays, but they work naturally with automatic differentiation and accelerators.Tensor 像数值数组,但更适合自动求导和加速计算。

XOR Network

A small network can learn a nonlinear pattern小网络可以学习非线性模式

input:  2 values
hidden: 2 units with tanh
output: 1 value with sigmoid

The hidden layer gives the model enough flexibility to represent XOR, which one linear classifier cannot solve.隐藏层让模型有能力表示 XOR;单个线性分类器做不到。

Random Initialisation

Repeated runs can end differently重复运行可能得到不同结果

With random weights, a small model may sometimes reach a good solution and sometimes get stuck in a local minimum.由于权重随机初始化,小模型有时能到达好解,有时会卡在局部最小值。

Tuning Habit

Small initial weights plus momentum can stabilise training小初始化加动量可以让训练更稳定

Training Loop

The standard PyTorch training rhythmPyTorch 训练循环的固定节奏

optimizer.zero_grad()
output = net(data)
loss = loss_function(output, target)
loss.backward()
optimizer.step()
Global vs Local Minimum

Loss can stop improving for different reasonsloss 停住可能有不同原因

A global minimum is the best loss the model can reach for the task. A local minimum is a point where small updates do not improve the loss, even though a better solution exists elsewhere.global minimum 是模型对这个任务能达到的最好 loss;local minimum 是附近怎么小幅更新都不变好,但远处还有更好解的位置。

When a small model succeeds only in some random runs, it is a sign that initialisation and optimiser settings matter.如果小模型只有部分随机运行成功,说明初始化和优化器设置很重要。
New Practice Prompt

Mini exercise for this page本页小练习

A model with tanh hidden units often gets stuck when initial weights are very large. What practical change would you try first?一个使用 tanh 隐藏层的模型在初始权重很大时经常卡住。你会优先尝试什么改动?

Answer: Use smaller initial weights so tanh units do not saturate immediately.使用更小的初始权重,避免 tanh 单元一开始就进入饱和区。