Convolution

Convolutional Architecture卷积网络结构

How to read convolution layers, count parameters, and understand common stabilisation architecture ideas.学习如何读卷积层、计算参数，并理解常见稳定训练结构。

Convolution Layer Size

No padding means the filter must fit inside the image无 padding 时卷积核必须完整放进图像

output_width = floor((input_width - filter_width) / stride) + 1
output_height = floor((input_height - filter_height) / stride) + 1

A stride greater than 1 skips positions, so the output feature map becomes smaller.stride 大于 1 时会跳着移动卷积核，所以输出 feature map 会变小。

Parameter Counting

A filter sees all input channels一个卷积核会看所有输入通道

weights_per_filter = filter_width * filter_height * input_channels + 1 bias
independent_parameters = weights_per_filter * number_of_filters
neurons = output_width * output_height * number_of_filters

Connections count every use of each filter weight across all spatial positions. Parameters count the learned numbers only once because convolution shares weights.connections 计算卷积核在所有位置使用的连接总数；parameters 只计算真正学到的数字，因为卷积共享权重。

Weight Initialisation

Initial scale affects gradient flow初始权重尺度会影响梯度流动

If activations or gradients shrink layer by layer, learning vanishes. If they grow layer by layer, training can explode. Good initialisation chooses a scale that keeps signals stable across layers.如果激活或梯度一层层变小，就会梯度消失；一层层变大，就会梯度爆炸。好的初始化会让信号在层间保持稳定。

Batch Normalisation

Normalise intermediate activations归一化中间激活值

normalise batch values to mean 0 and variance 1
then learn scale gamma and shift beta

Batch normalisation makes layers see more stable input distributions during training.批归一化让每层训练时看到更稳定的输入分布。

Residual vs Dense Connections

Skip paths help information and gradients move跳连帮助信息和梯度流动

Residual network: adds a block output to its input, usually written as x + F(x).把 block 的输出和输入相加，常写成 x + F(x)。
Dense network: concatenates features from earlier layers, so later layers receive many previous feature maps.把前面多层的特征拼接起来，让后面的层能直接使用许多早期特征。

Worked Counting Pattern

A generic convolution layer checklist卷积层计数通用清单

input: W x H x C
filters: K filters of size F x F
stride: S

output width  = floor((W - F) / S) + 1
output height = floor((H - F) / S) + 1
neurons = output_width * output_height * K
parameters = (F * F * C + 1) * K
connections = neurons * (F * F * C + 1)

The +1 is the bias. Parameters are shared across spatial positions, but connections are counted for every neuron.+1 是 bias。参数在空间位置上共享，但 connections 要按每个 neuron 实际连接来数。

Fully Connected Layers

Flatten before dense layers进入全连接层前要 flatten

flattened_inputs = width * height * channels
dense_parameters = (flattened_inputs + 1 bias) * output_units
dense_connections = dense_parameters

Unlike convolution, a fully connected layer usually has separate parameters for every input-output pair.和卷积不同，全连接层通常每个输入输出配对都有独立参数。

Pooling and Receptive Field

Two ideas often appear near convolution卷积附近常见的两个概念

Pooling: reduces spatial size by summarising nearby activations.通过汇总邻近激活值来减小空间尺寸。
Receptive field: the region of the original input that can affect a later neuron.后面某个 neuron 能看到的原始输入区域。

New Practice Prompt

Mini exercise for this page本页小练习

An input is 32 x 32 x 3. A convolution layer has 10 filters of size 5 x 5, stride 1, no padding. How many parameters?输入是 32 x 32 x 3。卷积层有 10 个 5 x 5 filters，stride 1，无 padding。参数有多少？

Answer: (5*5*3 + 1) * 10 = 760.(5*5*3 + 1) * 10 = 760。