What are neural network models?
What are neural network models?
LargeModel is a deep neural network model with millions or billions of parameters, which has undergone a specialized training process to enable complex processing and tasking of large-scale data.
LargeModels take up a lot of resources such as computational resources, storage space, time, and power to ensure its training and deployment. In contrast, a SmallModel is a deep neural network model with fewer parameters. SmallModels often run faster and are more lightweight, and are suitable for devices or scenarios that have fewer computational resources and storage space, such as mobile or embedded devices.
In practice, the choice of large or small models depends on the problem to be solved and the resources available. Large models usually perform well in natural language processing, computer vision, recommender systems, etc., and they usually require the support of high-performance computing resources, such as standard GPUs or cloud clusters.
Small models are suitable for solving simple, small-scale problems such as credit card fraud detection, and they have faster inference speeds and can run on low-power devices such as smartphones or IoT devices.
Problems that big models can solve
Large-scale pre-training effectively captures knowledge from a large amount of labeled and unlabeled data, and greatly extends a model’s ability to generalize by storing the knowledge in a large number of parameters and fine-tuning it for a specific task. Instead of starting from 0, only a small number of samples are needed for fine-tuning when responding to different scenarios.
And then let’s say BERT has been trained and we want to do a downstream task, doing sentiment analysis of a sentence. Then a classtoken will be added to the input token of BERT, this is the same as vit, encoder later use the vector of classtoken to do a bit of lineartransoformation and softmax and gt to do loss training, so this step can be directly initialized BERT model pre training parameters to do finetune, the effect is better. The convergence is fast and the loss is low.