Skip to content

Methods

There different approaches have been developed over the course of research to work with large language models and perform a variety of important tasks, such as classification, sentiment-analysis, generative tasks, writing, or writing code.

To enhance understanding of these methods we need to imagine that the input must be as such that the initial input vector to the language model will stimulate an appropriate response. By instructing the large language model we condition the input vector to a subspace that produces a sequence of vectors with high probabilities for the tokens that contain the answer. This mental model can very well guide us to resolve and debug problems with our prompts and understand better how few-shot may outperform zero-shot prompts.

The ideas for these methods have been developed in areas of machine learning outside lange larguage models, such as zero-shot, or few-shot. Traditionally, in the zero-shot method, an input from a target data domain is provided to a model that has not necessarily been trained on the data domain. Theoretically, the model should have been trained on data that presents the patterns found in the target data domain. An extension is few-shot where the input is augmented with training examples from target data domain. The training examples are used to run a few iterations of the already train model and then perform inference with the data example.

Although, these techniques bare a similarities, their effect and workings in large language models is still largelly only hypothetical. In zero-shot, an instruction is given and then the input, whereas in few-shot the prompts are filled with training examples and at the end the input is given.

There are many other methods, that are more advanced, and they mainly involve mixing few-shot with instruction that have the purpose to better frame and perform the task.

Finally, another approach is to provide a largelly more detailed prompt-instruction with or without examples, that would induce an input to the large language model that leads to a more detailed resolution and finally a correct outcome. The chain-of-thought method details an instruction of how to solve a specific problem with the hope of moving in the solution space closer to the appropriate outcome.