Convolutional Architecture卷积网络结构
How to read convolution layers, count parameters, and understand common stabilisation architecture ideas.学习如何读卷积层、计算参数,并理解常见稳定训练结构。
No padding means the filter must fit inside the image无 padding 时卷积核必须完整放进图像
output_width = floor((input_width - filter_width) / stride) + 1
output_height = floor((input_height - filter_height) / stride) + 1
A stride greater than 1 skips positions, so the output feature map becomes smaller.stride 大于 1 时会跳着移动卷积核,所以输出 feature map 会变小。
A filter sees all input channels一个卷积核会看所有输入通道
weights_per_filter = filter_width * filter_height * input_channels + 1 bias
independent_parameters = weights_per_filter * number_of_filters
neurons = output_width * output_height * number_of_filters
Connections count every use of each filter weight across all spatial positions. Parameters count the learned numbers only once because convolution shares weights.connections 计算卷积核在所有位置使用的连接总数;parameters 只计算真正学到的数字,因为卷积共享权重。
Initial scale affects gradient flow初始权重尺度会影响梯度流动
If activations or gradients shrink layer by layer, learning vanishes. If they grow layer by layer, training can explode. Good initialisation chooses a scale that keeps signals stable across layers.如果激活或梯度一层层变小,就会梯度消失;一层层变大,就会梯度爆炸。好的初始化会让信号在层间保持稳定。
Normalise intermediate activations归一化中间激活值
normalise batch values to mean 0 and variance 1
then learn scale gamma and shift beta
Batch normalisation makes layers see more stable input distributions during training.批归一化让每层训练时看到更稳定的输入分布。
Skip paths help information and gradients move跳连帮助信息和梯度流动
- Residual network: adds a block output to its input, usually written as
x + F(x).把 block 的输出和输入相加,常写成x + F(x)。 - Dense network: concatenates features from earlier layers, so later layers receive many previous feature maps.把前面多层的特征拼接起来,让后面的层能直接使用许多早期特征。
A generic convolution layer checklist卷积层计数通用清单
input: W x H x C
filters: K filters of size F x F
stride: S
output width = floor((W - F) / S) + 1
output height = floor((H - F) / S) + 1
neurons = output_width * output_height * K
parameters = (F * F * C + 1) * K
connections = neurons * (F * F * C + 1)
The +1 is the bias. Parameters are shared across spatial positions, but connections are counted for every neuron.+1 是 bias。参数在空间位置上共享,但 connections 要按每个 neuron 实际连接来数。
Flatten before dense layers进入全连接层前要 flatten
flattened_inputs = width * height * channels
dense_parameters = (flattened_inputs + 1 bias) * output_units
dense_connections = dense_parameters
Unlike convolution, a fully connected layer usually has separate parameters for every input-output pair.和卷积不同,全连接层通常每个输入输出配对都有独立参数。
Two ideas often appear near convolution卷积附近常见的两个概念
- Pooling: reduces spatial size by summarising nearby activations.通过汇总邻近激活值来减小空间尺寸。
- Receptive field: the region of the original input that can affect a later neuron.后面某个 neuron 能看到的原始输入区域。
Mini exercise for this page本页小练习
An input is 32 x 32 x 3. A convolution layer has 10 filters of size 5 x 5, stride 1, no padding. How many parameters?输入是 32 x 32 x 3。卷积层有 10 个 5 x 5 filters,stride 1,无 padding。参数有多少?
Answer: (5*5*3 + 1) * 10 = 760.(5*5*3 + 1) * 10 = 760。