Calculation of the number of parameters of the convolutional neural network
We use keras neural network modeling, print the summary of the model (print(model.summary)) will appear in each layer of the number of parameters, many beginners may be how the number of parameters is calculated is very confusing, this article will be from shallow to deep explanation of the convolutional neural network parameters of the calculation of the various layers of the method. If you don’t understand the structure of convolutional neural networks, please use Baidu to understand it.
1. What are the parameters of a convolutional neural network
We all know that a neural network is a process of forward propagation and backward optimization, and that this optimization process actually optimizes the weights of the connections between the layers, w and b. In fact, each layer of the connections can be simply abstracted as follows: WX+b=Y
Which is the input of the X and the Y prediction, can be used as the training data input. training data inputs, and are therefore known to us. The unknowns in the whole process are W and b are also parameters that need to be obtained through training.
The parameters of a convolutional neural network are slightly more specific because of the nature of convolutional networks, but overall they are still a fit to the weights W and the bias b.
2. Calculation of the number of parameters in the convolutional layer
Convolutional layer is to scan the original input through the convolution kernel, and then obtain the output with local features, the prediction accuracy and the setting of the convolution kernel has an important relationship with the convolution kernel, therefore, the convolution kernel is the parameter that we need to fit, as for the calculation of the number of parameters how to calculate the number of parameters, you need to be careful to carry out the decomposition.
(1) number of convolution kernels
Often with a convolution kernel can not get all the information of the input, so usually set up more than one convolution kernel, each convolution kernel is independent, the calculation is exactly the same. Each convolutional kernel generates a feature_map after processing the raw input, so the number of convolutional kernels and the number of feature_maps are actually the same.
(2) Convolutional kernel size
In two-dimensional neural networks (i.e., the commonly used convolutional neural network model for image processing), the original input is usually convolved with square convolutional kernels, commonly used in 33, 55, 77, etc.
(3) the number of channels in the original image
The number of channels affects the number of subconvolution kernels, for example, there are three channels of the image, each channel needs to be processed with a convolution kernel, but the parameters of the convolution kernel are in fact different, and ultimately summed up all the channels and add a bias is the feature_map that we ultimately get, so the number of parameters actually becomes 333 +1. The number of parameters actually becomes 333+1, where 1 represents the bias, which is shared among all channels.
(4) Final number of parameters
Multiply the number of convolution kernels by the size of convolution kernel, which is the final number of parameters, which is 2(333+1). 2 represents the number of convolution kernels, the first 3 represents the number of channels, the second 3 and the third 3 represent the size of convolution kernel, and 1 represents the offset.
3. Pooling layer parameter number calculation
Pooling layer is based on the pooling layer size to determine the value of the reservation, can be understood as a feature_map of the downsizing, which does not involve the calculation of parameters.
4. Calculation of the number of parameters of the fully connected layer
The fully connected layer first of all, the output of the pooling layer of data for the flattern processing, that is, 40030 converted to 120001, assuming that the output of the 12-dimensional, then the number of parameters for the 1200012 + 12, plus 12 represents the full connected layer after each with a bias b.
Whether one-dimensional, two-dimensional or three-dimensional convolutional neural network of which layer of parameter calculation, can be solved with the idea of convolutional layer parameter calculation, first determine the size of the convolution kernel, followed by determining the number of convolution kernels, and ultimately determine the number of channels of the image input. After these things are determined, then solving the parameters is very simple.
This post assumes that you already know a lot about the basics of convolutional neural networks, so there is no relevant illustration, but only introduces the idea of parameter determination, and I hope to inspire you.
Which layers does a convolutional neural network consist of
Vision – Convolutional Layer Basics
If we design 6 convolutional kernels, it can be understood that: we consider that there are 6 underlying texture patterns on this image, i.e., we are able to depict an image using just 6 of the underlying patterns.
The role of the convolutional layer is to extract the features of a localized region. ConvolutionalNeuralNetwork (CNN or ConvNet) is a deep feedforward neural network with properties such as local connectivity and weight sharing. Convolutional neural networks are motivated by the biological mechanism of sensory field.
Each convolutional layer in a convolutional neural network consists of a number of convolutional units, and the parameters of each convolutional unit are optimally obtained by a back-propagation algorithm.
The connections between convolutional layers in a convolutional neural network are called sparseconnection, meaning that neurons in a convolutional layer are connected to only some, but not all, of their neighboring layers, as opposed to full connections in a feedforward neural network.
Convolutional Neural Networks in General
Convolutional neural networks are a class of feedforward neural networks that contain convolutional computation and have deep structure, and are one of the representative algorithms for deep learning. Convolutional neural networks are also known as “translation invariant artificial neural networks” because of their ability to learn representations and classify input information according to their hierarchical structure.
Convolutional Neural Networks (CNNs) are a class of feedforward neural networks that contain convolutional computation and have a deep structure, and are one of the representative algorithms of deeplearning.
ConvolutionalNeuralNetworks (CNN) is a feedforward neural network. Convolutional neural networks are proposed by the mechanism of biological ReceptiveField. Receptive field mainly refers to some properties of neurons in auditory system, proprioceptive system and visual system.
Structure of Convolutional Neural Networks
1, in other words, the most common convolutional neural network structure is as follows: INPUT-[[CONV-RELU]*N-POOL?]*M-[FC-RELU]*K-FC where * refers to the number of repetitions, and POOL? refers to an optional convergence layer.
2. Current convolutional neural networks are generally feed-forward neural networks consisting of a cross-stack of convolutional, convergence and fully connected layers, which are trained using a back-propagation algorithm. Convolutional neural networks have three structural properties: local connectivity, weight sharing, and convergence. These properties give the convolutional neural network some degree of translation, scaling, and rotation invariance.
3. ConvolutionalNeuralNetworks (CNNs) are feed-forward neural networks. Convolutional neural networks are proposed by the mechanism of biological ReceptiveField. Receptive field mainly refers to some properties of neurons in auditory system, proprioceptive system and visual system.
34-Convolutional Neural Networks (Conv)
Structural features: the basic composition of neural networks (neuralnetworks) include input layer, hidden layer, output layer. And the convolutional neural network is characterized by the hidden layer is divided into convolutional layer and pooling layer (poolinglayer, also known as downsampling layer).
ConvolutionalNeuralNetwork (CNN or ConvNet) is a deep feed-forward neural network with properties such as local connectivity and weight sharing. Convolutional neural networks are motivated by the biological mechanism of sensory field.
ConvolutionalNeuralNetworks (CNN) is a feed-forward neural network. Convolutional neural networks are proposed by the mechanism of biological ReceptiveField. Receptive field mainly refers to some properties of neurons in auditory system, proprioceptive system and visual system.
-Convolution step setting (StridedCOnvolution) convolution step is also when we carry out the convolution operation, the filter each time to move the step length, above we introduced the convolution operation step default are 1, that is to say, each time to move the filter we are moving to the right of a grid, or down a grid.
The basic structure of a convolutional neural network consists of the following components: an input layer, a convolutional layer, a pooling layer, an activation function layer, and a fully connected layer.
We use an odd number of kernels of height and width in a convolutional neural network, such as a 3×3, 5×5 convolutional kernel, and keep the input and output sizes the same by choosing padding of size k on both sides of the height (or width) of the kernel of size 2k+1, so that the step size is 1.
What is not the layer structure of a convolutional neural network
The main structures of a convolutional neural network are: the convolutional layer, the pooling layer, and the fully connected layer grouping. Convolutional layer convolutional kernel is a series of filters used to extract a certain kind of feature we use it to process an image, when the image features are similar to the features represented by the filters, the convolution operation can get a larger value.
the basic structure of cnn does not include: inverse pooling layer. introduction to the basic components of cnn: local receptive fields. In an image localized pixels are more strongly connected to each other, while pixels farther away are relatively weakly connected.
The basic structure of a convolutional neural network consists of the following parts: an input layer, a convolutional layer, a pooling layer, an activation function layer, and a fully connected layer.
Neural networks include convolutional layers, what other layers are included
1, ConvolutionalNeuralNetwork (CNN) is a kind of feed-forward neural network, its artificial neurons can respond to a portion of the coverage of the surrounding units, and has excellent performance for large-scale image processing.
2. The basic structure of a convolutional neural network consists of the following parts: an input layer, a convolutional layer, a pooling layer, an activation function layer and a fully connected layer.
3, the current convolutional neural network is generally a feed-forward neural network consisting of convolutional, pooling and fully connected layers cross-stacked, and trained using the back-propagation algorithm. Convolutional neural networks have three structural properties: local connectivity, weight sharing, and convergence.
Convolutional Neural Network CNN (ConvolutionalNeuralNetwork)
The above figure calculates the process as, first of all, we can be called the right side of the convolution of the filter can also be called a nucleus, covering the left side of the first region, and then respectively, according to the corresponding position of the multiplication and then add, 3 * 1 +1 * 1 + 2 * 1 + 0 * 0 + 0 * 0 + 0 * 0 + 1 * (-1) + 8 * (-1) + 2 * (-1) = -5;
In accordance with the above calculation process. 0+1*(-1)+8*(-1)+2*(-1)=-5;
According to the above calculations, gradually press to move right by one step (the step can be set to 1,2,…), and then press to move down by one step (the step can be set to 1,2,…). etc.), and then press to move down, gradually calculating the corresponding value to arrive at the final value.
As shown above, for the first image matrix corresponding to the figure, one side is white, one side is black, then there will be a vertical edge in the middle, we can choose a vertical edge detection filter, such as multiplication of the right side of the matrix, then the two do the convolution of the resulting figure will be shown as the result of the right side of the equals sign matrix corresponding to the grayscale figure in the middle of the middle there will be a white intermediate band, that is, a white band. detected edge, then why do you feel that the middle edge band will be wider? Why does it feel like the center band is wider instead of a very thin local area? The reason is that our input image is only 6 * 6, too small, if we choose to output a larger size of the map, then the result is relatively a fine edge detection band, but also will be our vertical edge features extracted.
The above are manual selection of the parameters of the filter, with the development of neural networks we can use the back propagation algorithm to learn the parameters of the filter
We can turn the value of the convolutional caretaker into a parameter, through the back propagation algorithm to learn, so that the learned filter, or convolution kernel will be able to recognize a lot of features, rather than relying on the manual selection of filters. .
-Padding operation, convolution often has two problems:
1. The image shrinks with each convolution, and if there are many layers of convolution, the image behind it shrinks very small;
2. Edge pixels are utilized only once, which is obviously less than pixels located in the middle, and therefore the edge image information is lost.
In order to solve the above problem, we can fill the pixels at the edge of the image, which is called padding operation.
If we set the number of pixels to be padded at the edges of the image to be p, then the convolved image is: (n+2p-f+1)x(n+2p-f+1).
How to choose p
There are usually two choices:
-Valid: that is, no padding operation (nopadding), so if we have an image of nxn and a filter of fxf, then we convolve nxnfxf=(n-f+1)x(n-f+1) to the output image;
– Same: that is, no padding operation. Same: that is, after filling is the output image of the same size as the input, the same will have (n + 2p) x (n + 2p) fxf = nxn, then you can calculate, n + 2p-f +1 = n, get p = (f-1)/2.
Often for the choice of filters there is a default criterion for the selection of filters is to choose the filter size is an odd number of filters.
StridedCOnvolution is the length of the step that the filter moves each time we perform a convolution operation. The convolution operation we described above has a default step of 1, which means that each time we move the filter we move it one frame to the right, or one frame down.
But we can set the step size of the convolution, that is, we can set the number of frames that the convolution moves. Similarly, if our image is nxn, the filter is fxf, the padding is set to p, and the step size strided is set to s, then the output image after we perform the convolution operation is ((n+2p-f)/s+1)x((n+2p-f)/s+1), then this will lead to a problem, if the result of the calculation is not an integer how to do?
It is generally a convention to choose to round down, that is to say, to compute our filter only if it is completely on the image that can be covered.
In fact, the operation described above is not the definition of convolution from a strict mathematical point of view. The definition of convolution is that we need to mirror the convolution kernel or our filter before we move the step size, that is, before we multiply the corresponding elements, and then multiply the corresponding elements after the mirroring operation, which is a strictly convolutional operation. In mathematical terms, this operation is not strictly a convolution operation, it should be a mutual correlation operation, but in the field of deep learning, we have omitted the inversion operation by convention, and also call this operation a convolution operation
We know that the color image has three channels of RGB, and therefore the input is a three-dimensional input, so how do we perform convolution operation on a three-dimensional input image?
Example, such as the above figure we input image is assumed to be 6 × 6 × 3, 3 on behalf of the three channels of RGB channel, or can be called depth depth, the filter selection for the 3 × 3 × 3, which need to be specified is that the channel of the Guanxiao device must be the same as the channel of the input image, there is no restriction on the length and width of the process of calculating the then we will be filter’s stereo overlay on the input, so that the corresponding 27 numbers correspond to multiply and then add to get a number that corresponds to our output, so after convolution in this way we arrive at an output layer of 4 × 4 × 1. If we have more than one filter, for example, we use two filters respectively one to extract the vertical features, and one to extract the horizontal features then the output map 4 × 4 × 2. That is, the output map represents the depth or the channel of our The depth of the output or the number of channels and filters are equal.
The convolutional labeling of the lth layer is as follows:
Joining our filters is 3×3×3 specification, if we set 10 filters, then the total number of parameters to be learned is 27 parameters for each filter then add a bias bias then 28 parameters for each filter, so the ten filters are 280 parameters. From here it is also clear that no matter what the size of our input image is, we only need to calculate these parameters, so parameter sharing is easy to understand.
In order to reduce the size of the model, increase the speed of computation, and at the same time improve the robustness of the extracted features, we often use pooling layers. Pooling layers are computed in a similar way to convolution, except that we need to perform a pooling operation for each channel.
There are generally two types of pooling: MaxPooling and AveragePooling.
The above is MaxPooling, so the calculation method is similar to convolution, first set the hyperparameters such as the size of the filter and the step size, and then overlay to the corresponding grid, and use the maximum value to replace its value as the output, for example, the above figure is The filter is chosen to be 2×2, and the step size is chosen to be 2, so the output is a 2×2 dimension, and each output grid is the maximum value of the input on the corresponding dimension of the filter. If it is average pooling, then it is the average of the values in between that are chosen as the values for the output.
So from the process above we see that the pooling operation enables the model to be narrowed down, and at the same time enables the feature values to be more visible, which improves the robustness of the extracted features.