Wavelet Neural Networks
Name: Zuhan Cheng
[Embedded Cow Readers] With the continuous research on optimization algorithms, neural networks have penetrated into many fields, solved many practical problems, and triggered constant thinking of human beings. This article discusses the related knowledge of wavelet neural networks.
[Embedded Cow Nose] BP Neural Networks Wavelet Transform Wavelet Neural Networks
[Embedded Cow Body]
The process of implementing a BP network is mainly divided into two phases, the first is the forward propagation of signals from the input layer through the implicit layer to reach the output layer, and the second is the backward propagation of the error from the The second stage is the back propagation of the error from the output layer through the implicit layer to the input layer. After the error is transmitted, the weights and biases between the input layer and the implicit layer, and between the implicit layer and the output layer are adjusted sequentially. As shown in Figure 1:
The neurons of the BP neural network are shown in Figure 2:
In which, the activation function is the Sigmoid function with the expression:
2.1 Wavelet Transform
Wavelet Transform is a new mathematical transform based on Fourier analysis It overcomes the limitations of the Fourier transform and the disadvantage of the window invariance of the windowed Fourier transform. Wavelet transform mainly realizes multi-scale refinement through stretching and translation, highlights the details of the problem to be dealt with, and effectively extracts local information.
2.2 Wavelet Neural Network
Wavelet neural network is an improved BP network, which replaces the original Sigmiod activation function of the implicit layer with a wavelet function, Morlet wavelet, whose expression is
The design of this The model diagram of the 4-layer wavelet neural network is shown in Figure 3:
2.3 Model building
a. Initialization of each parameter
In the network design of Figure 3, for the input samples, for the output samples, for the nodes of the input layer, the implied layer, and the output layer, respectively, and for the connection weights of each node.
b. Forward computation
The input of implied layer 1 is the weighted sum of all the inputs: and the output of implied layer 1 is. The inputs and outputs of the remaining implicit layers and the output layer are similar to 1 and will not be repeated here.
c.Error Back Propagation
Error back propagation uses a gradient descent algorithm to adjust the weights between layers, i.e., the weight correction process. There are two ways to correct the weights, one is to correct them one by one according to the input samples, and the other is to correct them after all the samples are input. The first method is used in this article.
According to the error function to correct the weights and wavelet factors, in order to avoid the algorithm to fall into a local minimum, to speed up its convergence, the momentum factor is introduced, the learning rate is, and the formula is expressed as follows respectively:
Summary: Wavelet neural network has the advantages of the wavelet transform, avoiding the blindness of the structure of the design of BP network, but the number of nodes of the implied layer and the various layers The initialization parameters of the weights and scale factors between them are difficult to determine, which will affect the convergence speed of the network. In the subsequent study, we can try other wavelet function neural networks and construct wavelet neural networks by comparing their optimal results.
An article on four basic neural network architectures
Just getting started with neural networks, you will often be confused by the many neural network architectures. This article will introduce four common neural networks, namely CNN, RNN, DBN, and GAN. through these four basic neural network architectures, we will have a certain understanding of neural networks.
A neural network is a model in machine learning, an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks for distributed parallel information processing. This type of network relies on the complexity of the system to process information by adjusting the relationship between the large number of nodes interconnected within it.
In general, the architecture of neural networks can be divided into three categories:
Feed-forward neural networks:
This is the most common type of neural network used in practical applications. The first layer is the input and the last layer is the output. If there are multiple hidden layers, we call them “deep” neural networks. They compute a series of transformations that change the similarity of the samples. The activity of the neurons in each layer is a nonlinear function of the activity of the previous layer.
Recurrent networks have directed loops in their connection graphs, which means you can follow the arrows back to where you started. They can have complex dynamics that make them hard to train. They are more biologically realistic.
Recurrent networks are intended use to process sequential data. In a traditional neural network model, it’s from the input layer to the hidden layer to the output layer, and the layers are fully connected to each other, with unconnected nodes between each layer. But this ordinary neural network is incompetent for many problems. For example, if you want to predict what the next word in a sentence will be, you generally need to use the previous word, because the words before and after a sentence are not independent.
Recurrent neural networks, where the current output of a sequence is also related to the previous output. The network remembers the previous information and applies it to the computation of the current output, i.e., the nodes between hidden layers are no longer unconnected but connected, and the input to the hidden layer includes not only the output of the input layer but also the output of the hidden layer at the previous moment.
Symmetric Connected Networks:
Actually, the previous post talked a little bit about perceptual machines, so I’ll recap it here.
First of all, it’s still this picture
This is an M-P neuron
A neuron has n inputs, each of which corresponds to a weight, w. Inside the neuron, it will sum the inputs with the weights by multiplying them and then summing them up, the result of the summing up will be done with the bias as a difference, and the result is eventually placed into an activation function, which will give the final output, which tends to be The output is often binary, with a 0 state representing inhibition and a 1 state representing activation.
The perceptron can be thought of as a hyperplane decision surface in an n-dimensional instance space, where the perceptron outputs 1 for samples on one side of the hyperplane, and 0 for instances on the other side, and this decision hyperplane equation is w⋅x=0. The set of positive and negative samples that can be partitioned by a hyperplane is called a linearlyseparable The set of samples can then be represented using the perceptual machine in Fig.
With, or, and non-problems are linearly separable problems that can be easily represented using a perceptron with two inputs, while different or is not a linearly separable problem, so using a single-layer perceptron does not work, and it is then necessary to use a multilayer perceptron to solve the puzzling problem.
What should we do if we want to train a perceptual machine?
We would start with random weights and repeatedly apply this perceptron to each training sample, modifying the perceptron’s weights whenever it misclassified a sample. Repeat this process until the perceptron correctly classifies all samples. Each step modifies the weights according to the perceptron training law, that is, modifying the weights wi corresponding to the input xi, which is as follows:
Here t is the target output of the current training sample, o is the output of the perceptron, and η is a positive constant known as the learning rate. The learning rate serves to moderate the extent to which the weights are adjusted at each step; it is usually set to a small value (e.g., 0.1) and is sometimes made to decay as the number of times the weights are adjusted increases.
Multilayer perceptual machines, or multilayer neural networks, are nothing more than multiple hidden layers between the input and output layers, and subsequent neural networks such as CNNs, DBNs, and so on, are nothing more than redesigned types of each layer. Perceptual machine can be said to be the basis of the neural network, the subsequent more complex neural networks are inseparable from the simplest model of the perceptual machine,
When it comes to machine learning, we tend to follow a word called pattern recognition, but the real environment of the pattern recognition will often appear a variety of problems. For example:
Image segmentation: real scenes are always mixed with other objects. It is difficult to determine which parts belong to the same object. Some parts of an object can be hidden behind other objects.
Object illumination: the intensity of pixels is strongly affected by light.
Image distortion: objects can be distorted in various non-affine ways. For example, handwriting can also have a large circle or just a pointed tip.
Situational support: the category to which objects belong is usually defined by how they are used. For example, chairs are designed for people to sit on, so they come in a variety of physical shapes.
The difference between a convolutional neural network and a regular neural network is that a convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer. In the convolutional layer of a convolutional neural network, a neuron is connected to only some of its neighboring neurons. In a convolutional layer of a CNN, it usually contains a number of feature planes (featureMap), each feature plane consists of a number of neurons arranged in a rectangular shape, and neurons in the same feature plane share the weights, where the shared weights are the convolutional kernel. The convolution kernel is generally initialized in the form of a matrix of random fractions, and the convolution kernel will learn to obtain reasonable weights during the training process of the network. The immediate benefit of shared weights (convolution kernel) is to reduce the connectivity between the layers of the network while reducing the risk of overfitting. Sub-sampling is also called pooling and usually comes in the form of meanpooling and maxpooling. Sub-sampling can be seen as a special kind of convolution process. Convolution and subsampling greatly simplify the model complexity and reduce the parameters of the model.
The convolutional neural network consists of three parts. The first part is the input layer. The second part consists of a combination of n convolutional and pooling layers. The third part consists of a fully connected multilayer perceptron classifier.
Here’s an example of AlexNet:
-Input: 224×224 sized image, 3 channels
-First convolutional layer: 96 convolutional kernels of 11×11 size, 48 on each GPU.
-First layer max-pooling: 2×2 kernels.
-Second layer of convolution: 5×5 convolution kernels 256, 128 on each GPU.
-Second layer max-pooling: 2×2 kernels.
-Third layer convolution: fully connected to the previous layer, 384 convolution kernels in 3×3. Split to two GPUs 192.
– Fourth convolutional layer: 384 convolutional kernels of 3×3, 192 on each of the two GPUs. This layer is connected to the previous layer without going through a pooling layer.
– Fifth convolutional layer: 256 convolutional kernels of 3×3, 128 on each of the two GPUs.
– Layer 5 max-pooling: 2×2 kernels.
-First layer fully-connected: 4096 dimensions, connecting the output of the fifth max-pooling layer into a one-dimensional vector as input to that layer.
-Second fully connected layer: 4096 dimensions
-Softmax layer: the output is 1000, and each dimension of the output is the probability that the picture belongs to that category.
Convolutional neural networks have important applications in the field of pattern recognition, of course, here is only the simplest explanation of convolutional neural networks, convolutional neural networks still have a lot of knowledge, such as local sense of the field, the weights are shared, multiple convolutional kernels and so on, the subsequent opportunity to explain.
Traditional neural networks are difficult to deal with for many problems, for example, you want to predict what the next word in the sentence, usually need to use the previous word, because a sentence before and after the word is not independent. the reason why the RNN is called a recurrent neural network, that is, a sequence of the current output is also related to the output of the previous. The specific form of expression is that the network will memorize the previous information and apply it to the calculation of the current output, i.e., the nodes between the hidden layers are no longer unconnected but connected, and the input of the hidden layer includes not only the output of the input layer but also the output of the hidden layer at the previous moment. Theoretically, RNN is able to process sequence data of any length.
This is the structure of a simple RNN, and you can see that the hidden layer itself is able to connect to itself.
So why the hidden layer of the RNN can see the output of the hidden layer of the previous moment, in fact, we unfolded the network to open it is very clear.
From the equation above, we can see that the difference between the loop layer and the fully connected layer is that the loop layer has an additional weight matrix W.
If we repeatedly bring equation 2 into equation 1, we will get:
Before we talk about DBNs, we need to have some idea of the basic building block of DBNs, which is the RBM, the Restricted Boltzmann Machine.
First of all what is a Boltzmann machine?
[Image upload failed… (image-d36b31-1519636788074)]
A Boltzmann machine is shown in the figure with blue nodes for the hidden layer and white nodes for the input layer.
Boltzmann machine and recurrent neural networks, compared to the difference is reflected in the following points:
1, recurrent neural networks are essentially to learn a function, so there is the concept of input and output layers, while the Boltzmann machine is used to learn a set of data “intrinsic representation”, so it does not have the concept of output layers.
2. The nodes of a recurrent neural network are linked in a directed ring, while the nodes of a Boltzmann machine are linked in an undirected complete graph.
And what is a restricted Boltzmann machine?
In the simplest terms it is the addition of a restriction, and this restriction is what turns the complete graph into a bipartite graph. That is, it consists of a dominant layer and a hidden layer, with bi-directional full connections between neurons in the dominant and hidden layers.
h denotes the hidden layer and v denotes the explicit layer
In RBM, any two connected neurons have a weight w between them to indicate the strength of their connection, and each neuron itself has a bias coefficient b (for the explicit neuron) and c (for the implicit neuron) to indicate its own weight.
The exact derivation of the formulas is not shown here
DBN is a probabilistic generative model, as opposed to the traditional discriminative modeling of neural networks, where the generative model builds a joint distribution between observations and labels, where both P(Observation|Label) and P(Label|Observation ) are evaluated, while the discriminative model only evaluates only the latter, which is P(Label|Observation).
The DBN consists of multiple layers of Restricted Boltzmann Machines, a typical type of neural network shown in the figure. These networks are “restricted” to a visible layer and a hidden layer, with connections between the layers, but not between the units within the layers. The hidden layer units are trained to capture the correlation of higher-order data expressed in the visual layer.
Generative Adversarial Networks were actually explained in a previous post, so I’ll explain them here.
The goal of generative adversarial networks is to generate, and our traditional network structures tend to be discriminative models, i.e., judging the veracity of a sample. Generative models, on the other hand, are able to generate similar new samples based on the samples provided, note that these samples are learned by the computer.
GANs generally consist of two networks, the generative model network, and the discriminative model network.
The generative model G captures the distribution of the sample data, and generates a sample similar to the real training data with noise z obeying a certain distribution (uniform, Gaussian, etc.), pursuing the effect that the more it resembles the real samples, the better; the discriminative model D is a binary classifier estimating the probability that a sample comes from the training data (rather than from the generated data), and if the sample comes from the real training data, D outputs a large probability, otherwise, D outputs a small probability.
As an example: the generative network G is like a counterfeit currency manufacturing gang, specializing in manufacturing counterfeit currency, and the discriminative network D is like a police officer, specializing in detecting whether the currency used is real or counterfeit, G’s goal is to find ways to generate currency that is the same as the real currency, so that D can’t discriminate it, and D’s goal is to find ways to detect the counterfeit currency generated by G.
Traditional discriminative network:
Generative adversarial network:
The following shows an example of a cDCGAN (written in an earlier post)
The final result, using MNIST as the initial sample, and the numbers generated by learning, you can see that the learning is still good.
This article is a very brief introduction to four neural network architectures, CNN, RNN, DBN, and GAN, but of course it’s only a brief introduction, and doesn’t go into great depth. These four neural network architectures are very common and widely used. Of course, about the knowledge of neural networks, it is not possible to explain the end of a few posts, the knowledge here to explain some of the basics, to help you quickly into (zhuang) door (bi). Later posts will be on the depth of the autoencoder, Hopfield network long short-term memory network (LSTM) to explain.
The first step of a neural network algorithm in general
The first step of a neural network algorithm in general is learning. In this process, the neural network needs to constantly synapse values in order to improve the algorithm’s performance and better accomplish the tasks assigned to it.
ArtificialNeuralNetworks (ANN) systems emerged after the 1940s. It is made up of numerous neurons connected by adjustable connection weights;
It has the characteristics of massively parallel processing, distributed information storage, and good self-organization and self-learning ability, etc. The BP (BackPropagation) algorithm, also known as Error Backpropagation algorithm, is a supervised learning algorithm in artificial neural networks.
The BP neural network algorithm can theoretically approximate any function, and the basic structure consists of nonlinear change units with strong nonlinear mapping ability.
And the number of intermediate layers of the network, the number of processing units in each layer and the learning coefficients of the network and other parameters can be set according to the specific situation, with great flexibility, and has a wide range of prospects for application in many fields such as optimization, signal processing and pattern recognition, intelligent control, fault diagnosis and so on.
The study of artificial neurons originated from the neuron doctrine of the brain. at the end of the 19th century, in the field of biology and physiology, Waldeger and others created the doctrine of neurons. It was recognized that the complex nervous system is a combination of a large number of neurons.