### How to Understand Weight Sharing in Convolutional Neural Networks

The so-called weight sharing means that, given an input image, a filter is used to sweep the image, the number inside the filter is called the weight, and each position of the image is swept by the same filter, so the weights are the same, i.e., shared. This may not be very clear, if you can understand what is called a fully connected neural network, then from a point of view to minimize the number of parameters can be understood. For an input image, the size of W*H, if you use a fully connected network to generate an X*Y featuremap, you need W*H*X*Y parameters, if the original image length and width is 10^2 level, and the XY size is about the same as WH, then the number of parameters needed for such a layer of the network is 10^8~10^12 level. So many parameters is definitely not possible, then we find a way to reduce the number of parameters for each pixel on the output layer featuremap, he has a connection with each pixel of the original image, each link needs a parameter. But notice that images are generally locally correlated, so if each pixel of the output layer is only locally connected to a single part of the input layer image, then the number of parameters needed will be greatly reduced. Assuming that each pixel of the output layer is connected to only one small square of F*F on the input image, that is, this pixel value of the output layer is only computed from the pixel value in this small square of F*F of the original image, then for each pixel of the output layer, the number of parameters needed is reduced from the original W*H to F*F. If for each box of F*F of the original image, it is necessary to compute such a If for each F*F box of the original image, the number of parameters needed is only W*H*F*F. If the width and length of the original image is 10^2, and the F is within 10, then the number of parameters needed is only 10^5~10^6, which is much smaller than the original 10^8~10^12.

### CNN (Convolutional Neural Network) Algorithm

Basics Explained:

Convolution: a mathematical operator that generates a third function from two functions f and g. Characterizes the integral of the product of the function values of the overlapping parts of the functions f and g after flipping and translating over the overlap length.

Feedforward neural network: each neuron is arranged in layers, each neuron is connected only to the neurons of the previous layer, receiving the output of the previous layer and outputting it to the next layer. There is no feedback between the layers.

Convolutional neural network: a class of feed-forward neural networks that contain convolutional computation and have a deep structure

Convolutional kernel: that is, when image processing, given an input image, the pixels in a small region of the input image are weighted to become each corresponding pixel in the output image, in which the weights are defined by a function that is called a convolutional kernel.

Downsampling: Sampling a sequence of samples at intervals of several samples so that a new sequence is downsampled from the original sequence.

Architecture

Input layer: used for data input

Convolutional layer: use convolutional kernel for feature extraction and feature mapping

Excitation layer: non-linear mapping, convolution is a linear mapping, to make up for the shortcomings

Pooling layer: downsampling, sparse processing of feature maps, to reduce the amount of data computation

Fully-connected layer: re-fitting the CNN at the tail. Input layer:

Input layer:

In the input layer of the CNN, the format of the (image) data input is not quite the same as that of the input format of the fully connected neural network (one-dimensional vectors).The input format of the CNN’s input layer preserves the structure of the image itself.

For a black-and-white 28×28 picture, the CNN’s input is a 28×28 2D neuron:

And for a 28×28 picture in RGB format, the CNN’s input is a 3×28×28 3D neuron (there is one 28×28 matrix for each color channel in RGB)

Convolutional layer:

The left side is the input, the middle part is two different filters Filterw0, Filterw1, and the far right side is two different outputs.

ai.j=f(∑m=02∑n=02wm,nxi+m,j+n+wb)

wm,n:the value of the mth row and the nth column of the filter

xi,j: denotes the element of the ith row and the jth column of the image

wb: is used to denote the bias item of the filter

ai,j: denotes the element of the the i-th row and j-th column element of the FeatureMap

f: denotes the Relu activation function

Excitation Layer:

The excitation function used is typically the ReLu function:

f(x)=max(x,0)

Convolutional and excitation layers are often combined together as the “convolution layer”.

Pooling layer:

When the input passes through the convolutional layer, if the sense field of view is smaller, the cloth length stride is smaller, and the obtained featuremap (featuremap) is still larger, you can use the pooling layer to perform a dimensionality reduction operation on each featuremap, and the depth of the output is still unchanged, and remains the same as the number of featuremaps.

The pooling layer also has a “pooling field of view (filter)” to scan the featuremap matrix, the “pooling field of view” in the matrix value is calculated, there are generally two kinds of calculation:

Maxpooling: take the maximum value in the “pooled horizon” matrix

Averagepooling: take the average value in the “pooled horizon” matrix

Training process:

1. p>

1. Forward calculate the output value aj of each neuron (denote the jth neuron of the network, the same as below);

2. Backward calculate the error term σj of each neuron, σj is also called sensitivity in some literature. It is actually the partial derivative of the network’s loss function Ed with respect to the neuron’s weighted input

3. Calculate the gradient of each neuron’s connection weights wi,j (wi,j denotes the weight of the connection from neuron i to neuron j)

1. Finally, just update each weight according to the gradient descent law.

Reference: https://blog.csdn.net/love__live1/article/details/79481052

### What exactly do neural network weights mean

Because the Gaussian distance is defined in Euclidean geometry. (That is, this is said to be the shortest)

The weights are something like the value of a in the equation y = ax + b,.

A trained neural network is not necessarily optimal for new data. It may not even necessarily be used for prediction.

### How to understand the weight sharing problem in AI neural networks?

The term weight (weight) sharing was introduced by the LeNet5 model. In the case of a CNN, for example, the parameters of the same convolution kernel are used during the convolution of a graph bias. For example, a 3×3×1 convolution kernel, the parameters of the 9 within this convolution kernel are shared by the entire image without changing the weight coefficients within the convolution kernel due to different positions within the image. To put it more bluntly, it is a convolutional kernel does not change the weight coefficients within the case of convolutional processing of the entire image (of course, each layer of CNN will not have only one convolutional kernel, this is just for the sake of convenient explanation only).