Steps in neural network model construction

Keras quickly build neural network models

Steps to build a neural network with Keras:

Deep learning framework Keras – like building blocks to build neural networks, is divided into seven parts, each part of only a few kerasAPI functions to be able to Each part only requires a few kerasAPI functions to realize, users can build neural network models layer by layer like building blocks.







7. Model SaveSavemodel

The following sections describe each section specifically.

There are three main types of models in Keras: Sequentialmodel, Functionalmodel, and Subclassmodel

Before you start to create a model, you need to introduce the tensorflow and keras modules. Then create a Sequentialmodel

SequentialAPI is defined as follows:

layers parameter can be null, then add layers to the model via addmethod, and correspondingly remove layers from the model via popmethod.

The model can be used as a model, and then as a model, and then as a model, and then as a model. p>

Creating a FunctionAPI model allows you to call Keras.Model to specify multiple inputs and multiple outputs.

Keras.Model Definition:

Layers are the basic neural network building blocks. A Layer contains the tensor-in/tensor-out calculations and some state, and is stored in TensorFlow variables (i.e., the weights of the layers weights).

Layers are divided into 6 main categories, base layer, core layer, volume base layer, pooling layer, loop layer, and fusion layer.

The implementation of a derived class can be done in the following ways:

**init(): defines the attributes of the layer and creates static variables for the layer.

**build(self,input_shape):Creates input-dependent variables, which can be called add_weight().

**call(self,*args,**kwargs):Call in call after making sure build() has been called.

**get_config(self):Returns a dictionary type containing the configuration used to initialize this layer.

Create SimpleDense derived class, add trainableweights in build() function. implement y=input*w+b

Result output:

Create ComputeSum derived class, add non-trainableweights in init function. implement y=input*w+b

Result output:

Create ComputeSum derived class, add non trainableweights.

Result output:

The core layers are the most commonly used layers, which are used when it comes to transforming and processing data.

The Dense layer is the so-called fully connected neural network layer, or fully connected layer for short. Each neuron in the fully connected layer is fully connected to all the neurons in the layer before it.

Dense implements the following operation: output=activation(dot(input,kernel)+bias) where activation is the activation function computed on an element-by-element basis, kernel is the weights matrix created by the network layer, and bias is the bias vector it creates (only when use_ bias is True is useful).

Apply the activation function to the output. The arithmetic processing that takes place after the input signal enters the neuron.

The comparison curves for sigmoid, tanh, ReLU, and softplus are shown below:

The activation function can be implemented either by setting a separate activation layer Activation, or by passing the activation parameter when constructing the layer object:

Dropout randomly sets the per-ratio of input units to 0 at each update in training, which helps prevent overfitting. Inputs not set to 0 will be scaled up by 1/(1-rate) to keep the sum of all inputs constant.

Note that the Dropout layer is only applied when training is set to True so that no values are discarded during inference. When using, training is automatically set to True appropriately.

Spreads the inputs. Does not affect batch size. Note: Spreading increases the channel size if the input shape is (batch,) no feature axis, and the output shape is (batch,1).

Resize the input to a specific size

Wrapping arbitrary expressions as Layer objects. In the Lambda layer so that arbitrary TensorFlow functions can be used when constructing the model. the Lambda layer is best suited for simple manipulation or quick experiments. the Lambda layer is saved by serializing Python bytecode.

Overriding sequences using override values to skip time steps.

For each time step of the input tensor (the first dimension of the tensor), if the value of the input tensor is equal to the mask_value in all time steps, the time step will be masked (skipped) in all downstream layers. If any downstream layer does not support overrides but still receives such input override information, an exception is thrown.


Embedding is a way to convert discrete variables into continuous vector representations. This layer can only be used as the first layer in a model.

Embedding has the following 3 main purposes: to find nearest neighbors in the embedding space, which can be well used to make recommendations based on the user’s interests. As an input for supervised learning tasks. For visualizing relationships between different discrete variables.



From Wikipedia, we can learn that convolution is a mathematical operation defined on two functions (𝑓 and 𝑔) designed to produce a new function. The convolution of 𝑓 and 𝑔 can then be written as 𝑓∗𝑔, with the following mathematical definition:

Corresponding to different aspects, convolution can be interpreted in different ways: 𝑔 can be viewed either as a kernel, which we often talk about in deep learning ( Kernel), but also corresponds to the Filter in signal processing. And 𝑓 can be either what we call a Feature in machine learning or a Signal in signal processing. the convolution of f and g (𝑓∗𝑔) can be thought of as a weighted summation of 𝑓.

One-dimensional time-domain convolution operation:

Two-dimensional image convolution operation:

One-dimensional convolutional layer (instant-domain convolution) used to perform neighborhood filtering on a one-dimensional input signal.


Result output:

2D convolutional layer (e.g. spatial convolution of an image).


Result output:

3D convolutional layer (e.g. spatial convolution on volume)


Result output:

Depth separable 1D convolution. This layer performs depth convolution acting on the channels separately, followed by point-by-point convolution of the mixed channels. It adds a bias vector to the output if use_bias is True and a bias initial value setting term is provided. It then optionally applies an activation function to produce the final output.

Depth separable 2D convolution. Separable convolution consists of first performing a depth-space convolution (which acts on each input channel separately), followed by a pointwise convolution, which mixes the resulting output channels. the depth_multiplier parameter controls how many output channels are generated for each input channel in the depth step.

Intuitively, separable convolution can be understood as a way of breaking the convolution kernel into two smaller kernels, or as an extreme version of the Inception block.

Transposed convolutional layers (sometimes made into anti-convolution). The need for a transposed convolution generally arises from the desire to use a transformation in the opposite direction of a normal convolution to convert something having the dimensions of the output of the convolution to something having the dimensions of the input of the convolution, while maintaining a pattern of connectivity that is compatible with said convolution.

The pooling layer mimics the human visual system by downscaling the data to represent the image with higher level features. The purpose of implementing pooling: reduce information redundancy; improve scale invariance and rotation invariance of the model. Prevent overfitting.

There is usually a maximum pooling layer, average pooling layer.

There are three types of pooling layers: 1D for one-dimensional data, 2D for two-dimensional image data, and 3D for image data with time-series data

RecurrentNeuralNetwork (RNN for short), the proposal of the recurrent neural network is based on the Recurrent Neural Networks (RNNs) are based on the idea of memory modeling, which expects the network to remember the features that appear before it and infer the results based on the features, and the overall network structure is constantly recurring, hence the name RNN.

The Long-ShortTermMemory (LSTM) paper was first published in 1997. Due to its unique design structure, LSTM is suitable for processing and predicting important events in time series with very long intervals and delays.


Results Output:

GRU Gated Recurrent Unit-Choetal.2014.

Three gate functions are introduced in LSTM: Input Gate, Forgetting Gate, and Output Gate to control the input value, memorized value, and output value. While in GRU model there are only two gates: update gate and reset gate respectively. Compared with LSTM, GRU has one less “gate control” and fewer parameters than LSTM, but it can achieve the same function as LSTM. Considering the computational power and time cost of the hardware, we often choose the more “practical” GRU.



Recurrent Neural Network Layer Base Class.

A note on specifying the initial state of an RNN

You can symbolically specify the initial state of an RNN layer by calling them with the keyword parameter initial_state. the value of initial_state should be a tensor or a list of tensors that represent the initial state of the RNN layer.

The initial state of an RNN layer can be specified digitally by calling the reset_states method with the keyword argument states. the value of states should be a Numpy array or list of Numpy arrays representing the initial state of the RNN layer.

A note on passing external constants to RNN

“External” constants can be passed to cells using the constants keyword argument to (and This requires that the method accepts the same keyword argument constants. these constants can be used to regulate cell transitions on additional static inputs (which do not change over time), as well as for attention mechanisms.


Before training the model, we need to configure the learning process, which is done through the compile method.

He receives three parameters: optimizer opt

Use the simplest neural network to do data classification, show the neural network training process

In this article, we use a simple neural network to do data classification, show the neural network training process, easy to understand

Neural network model: Y=w1x1+w2x2+b

Step 1: Generate the training data with labels

Step 2: Merge the data and disrupt the data, and then convert the data to the data type required by the Paddle framework

Step 3: Based on Paddle, construct the neural network, define the loss function and optimizer:Y=w1x1+w2x2+b

Step 4: Construct the training process

The last step is to plot the training results

Establishment of BP neural network ground settlement prediction model

The modeling process of BP neural network prediction model for ground settlement caused by pitfall is as follows:

(1) Sample Selection

Because of the close relationship between the amount of ground settlement caused by pitfall and distance from the pit, the modeling is selected as the sample for learning training and testing. (Type II)” (see Table 4.1) is used as a sample for training and testing.

(2) BP neural network structure design

For the BP network, for any continuous function in the closed interval can be approximated by a single hidden layer BP network, so a three-layer BP network can complete the arbitrary n-dimensional to m-dimensional mapping. According to the principle of network structure simplicity, a three-layer BP network structure is determined, i.e., the input layer is the four parameters of the settlement point distance from the pit L (m), the equivalent compression modulus E (MPa), the depth of water level drop H (m), and the support stiffness n. The output layer is the cumulative settlement of the ground surface (mm), and the number of the hidden layer is 1 layer. The selection of the number of neurons in the hidden layer is a very complex problem, which often needs to be determined based on the experience of the designer and many experiments, and thus there does not exist an ideal analytical formula to represent it. The number of hidden units is directly related to the requirements of the problem and the number of input and output units. Too many hidden units can lead to long learning times, errors that are not always optimal, and poor fault tolerance and failure to recognize previously unseen samples, thus there must exist an optimal number of hidden units. The study compares the training speed and test accuracy when the number of neurons in the hidden layer is 5, 10, 15, 20, 25, 30, and 40 by programming at one time.

Figure 4.2 Block diagram of BP neural network program

(3) Network training and testing

BP network using gradient descent method to reduce the training error of the network, taking into account the characteristics of the pit precipitation ground settlement within the settlement of the range of the settlement of the small change in the magnitude of the training target to take the training target of 0.001 as a control condition, taking into account the structure of the network is more complex, the number of neurons is more, need to appropriately increase the number of neurons. Considering the complex structure of the network, the number of neurons is more, need to appropriately increase the number of training and learning rate, so the initial number of training is set to 10,000 times, the learning rate of 0.1, the intermediate layer of the neuron transfer function using the S-type tangent function of tansig, the transfer function using logsig, the training function using the trainlm, the selection of the 38 sets of data in 33 sets of samples as a training sample, 5 sets of samples as a test sample.

(4) Network implementation and test effect

Using MATLAB6.0 programming to establish the prediction model of ground settlement of pitfall based on BP neural network (the program code is shown in Appendix 1), and its training error and test effect are as follows:

Figure 4.3 Training error curve

Figure 4.4 Prediction error curve

Figure 4.4 Prediction error curve

By the Fig. 4.3 and Figure 4.4, it can be seen that: the sample data converge, the training error is small, and the prediction accuracy is better when the number of neural units in the middle layer is 10, the error is less than 20%, and the error meets the engineering requirements.