Large language models such as GPT-3 use vast amounts of internet data to generate human-like text, ranging from poetry to programming code. These machine learning models take a piece of input text and predict the most probable text to follow.
In addition to generating text, large language models like GPT-3 can also learn to perform new tasks with very little training through in-context learning. This phenomenon allows the model to accomplish a new task after being shown only a few examples, even if it was not explicitly trained for that task. For example, a user could input a few example sentences and their corresponding sentiments, and then the model could accurately identify the sentiment of a new sentence.
Normally, a machine-learning model such as GPT-3 would require retraining with new data to perform a new task. During this training process, the model updates its parameters to learn the task. However, in the case of in-context learning, the model does not update its parameters, allowing it to learn a new task without actually “learning” anything at all.
Researchers from MIT, Google Research, and Stanford University are working to understand this mystery. They analyzed models similar to large language models to explore how they can learn without updating parameters.
The researchers’ findings suggest that massive neural network models can embed smaller linear models within them. This means that the larger model can use a simple learning algorithm to train the smaller linear model to perform a new task without updating the model’s parameters. This is because all the necessary information is already contained within the larger model, which remains fixed.
This research is an important step towards understanding the mechanics of in-context learning. It opens the door for more exploration of the learning algorithms that large models can use to accomplish new tasks without retraining. Ekin Akyürek, a computer science graduate student and lead author of the paper, states that this phenomenon is an exciting development because it eliminates the need for costly retraining by feeding the model an input and a few examples.
Usually, if you want to fine-tune these models, you need to collect domain-specific data and do some complex engineering. But now we can just feed it an input, five examples, and it accomplishes what we want. So in-context learning is a pretty exciting phenomenon,
Akyürek
Joining Akyürek on the paper are researchers from Google Brain, the University of Alberta, MIT, and Stanford. The research will be presented at the International Conference on Learning Representations.
A model within a model
In the machine-learning research community, many scientists have come to believe that large language models can perform in-context learning because of how they are trained,
says Akyürek
Large language models like GPT-3 were trained on huge amounts of text from the internet, including articles from websites like Wikipedia and Reddit. Because of this, when given a new task, it can recognize patterns in the data that are similar to what it has seen before in its training, rather than actually learning to perform the new task. In other words, it doesn’t really learn how to do new things but rather uses its previous training to recognize patterns and generate output.
Akyürek and his team believe that large language models are not just repeating what they have seen before, but actually learning how to perform new tasks. To test this, they experimented by providing these models with prompts using synthetic data that they could not have seen before. They found that the models were able to learn from these examples, even with just a few of them. Akyürek and his colleagues theorize that these large neural network models contain smaller machine-learning models within them, which can be trained to complete new tasks.
That could explain almost all of the learning phenomena that we have seen with these large models
Akyürek says
The researchers used a type of neural network called a transformer, which has a similar structure to GPT-3 but was designed to specifically learn in context. The transformer model is one of the most powerful machine learning models available and has been used for tasks like natural language processing and image recognition. By using a transformer model that was explicitly trained for in-context learning, the researchers could investigate whether this approach could allow the model to learn new tasks rather than just repeating patterns from its training data.
The researchers found that the transformer, which is a type of neural network model similar to GPT-3, can write a linear model within its hidden states. A neural network consists of many layers of interconnected nodes that process data, and the hidden states are the intermediary layers between the input and output layers. Essentially, this means that the transformer can create a simplified version of itself within these hidden layers that can be trained to perform new tasks.
The researchers found that the transformer model is able to write a simpler model within itself, which is located in the earliest layers of the neural network. This smaller model can be updated by implementing basic learning algorithms, which allows the transformer to effectively train a smaller version of itself to perform a new task. In other words, the transformer model can create and train a simplified version of itself within its own architecture to perform a new task.
Probing unseen layers
Actually, the part about “probing experiments” is not mentioned in the given text. The text only mentions “mathematical evaluations” and “theoretical proofs” of the hypothesis, which showed that the transformer model can write a linear model within its hidden states and update it with simple learning algorithms.
In this case, we tried to recover the actual solution to the linear model, and we could show that the parameter is written in the hidden states. This means the linear model is in there somewhere,
The researchers believe that by adding two layers to the transformer neural network, they could make it capable of in-context learning. However, there are many technical details that need to be worked out before this becomes possible. If successful, this could allow engineers to create models that can learn new tasks without needing to be retrained with new data. In other words, these models could adapt and improve their performance as they encounter new situations, rather than having to start from scratch every time.
Akyürek intends to conduct further research on in-context learning by examining more complex functions beyond the linear models they used in this study. This research could be extended to large language models to investigate if their behaviors are also characterized by simple learning algorithms. Additionally, he plans to investigate the types of pretraining data that can promote in-context learning.
With this work, people can now visualize how these models can learn from exemplars. So, my hope is that it changes some people’s views about in-context learning. These models are not as dumb as people think. They don’t just memorize these tasks. They can learn new tasks, and we have shown how that can be done.
says Ekin Akyürek
No Comment! Be the first one.