Intelligent Agent
An intelligent agent is a system or entity that operates autonomously to achieve a specific goal within a given environment. The agent gathers information from the environment through sensors and interacts with it via actuators. It possesses reasoning and planning capabilities to devise a sequence of tasks necessary to achieve the goal, and can adjust this sequence based on feedback from the environment. Over time, the agent learns from its experiences and updates its internal memory to enhance task execution. Recent breakthroughs in Large Language Models have sparked increased interest in AI agents, as they offer a new form of reasoning.
Learn MoreCompound AI System
A Compound AI System is a computer system that performs a task using more than just one AI component. While a common component is a trained machine learning model, such as a Large Language Model, the number and nature of components vary widely depending on the task. For example, a compound system that solves coding challenges can be based on two main components: an LLM that generates variations of coding solutions, and a code execution module, which tests for accuracy and evaluates the performance of the proposed solutions.
Learn MoreDiffusion Model
A diffusion model is a generative model inspired by the natural phenomenon of diffusion -- i.e. the gradual dispersing of particles through a medium -- except that in the reverse order. In diffusion-based image generation, for example, the model starts from random gaussian noise (a "tv static"-type image) and iteratively removes noise, until the final result is obtained.
Learn MoreDirect Preference Optimization
Direct Preference Optimization (DPO) is an alternative to Reinforcement Learning from Human Feedback (RLHF) to fine-tune Language Models to align their outputs with human preferences. DPO has two steps: (1) collect pairs of completions for each element in a dataset of prompts, and label each element of the pair as preferable or not-preferable; (2) fine tune the Language Model on this labeled dataset using a loss function, adapted from the RLHF loss, which implicitly rewards the model for following human preferences, while not diverging too much from the original (pre-trained) model. DPO simplifies RLHF in two ways: (1) by eliminating the need to train a separate reward model (which imitates human preference of outputs given two options); (2) by eliminating the need to use Reinforcement Learning to fine-tune the Language Model.
Learn MoreReinforcement Learning from Human Feedback
RLHF is a technique used to fine-tune certain generative models based on a secondary model that is trained to estimate a measure of perceived quality of the output of the generative model. Human raters provide labels to train the secondary model, and the predictions of this model are used as a guide to fine-tune the generative model in the context of a Reinforcement Learning algorithm.
Learn MoreLow-Rank Adaptation
Low-Rank Adaptation (or LoRA) is an approach to model fine-tuning which reduces the number of weights or parameters that are updated. It is based on the empirical observation that the updates in any given matrix of weights (for example, one representing a fully-connected layer) actually have a low 'intrinsic rank' (which is the maximum number of linearly independent columns or rows). Thus, the weight update ∆W is represented via the product BA, where the number of columns of B, as well as the number of rows of A, is set to a target low rank r.
Learn MoreTask Contamination
Task Contamination is a type of data leakage that affects the perceived capabilities of Large Language Models. Namely: the fact that some LLMs perform well on some N-shot learning tasks may be due to the fact that their training data included examples of those tasks, in which case they are not in fact acting as N-shot learners.
Learn MoreRetrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is a technique used in question-answering systems to augment the capabilities of Large Language Models (LLMs) with a domain-specific knowledge base. There are 2 pre-processing steps: (1) the knowledge base is segmented into appropriate chunks; (2) an embedding for each chunk is created and stored (along with the text chunk) into a vector database, which is indexed for faster retrieval. During deployment, there are 3 steps: (1) an embedding for the query is created; (2) relevant text is pulled from the vector database based on similarity with respect to the query's embedding; (3) the retrieved relevant text is appended to the original query as "context", and the "augmented query" is finally fed into the LLM. This is roughly the algorithm behind many "talk to your document(s)" applications made possible with the availability of LLMs (or Foundation Models for text).
Learn MoreVector Database
A vector database is literally a database of vectors, where vectors are data representations, such as embeddings, or features. Contrary to traditional databases, where search is done by matching keywords, in vector databases the search is by proximity -- this is called 'semantic search' since nearby vectors correspond to data points of similar semantic content. Vector databases are growing in popularity due to their use in extending LLM (Large Language Model) capabilities without re-training or fine-tuning them. For example: given a set of documents, one can create a corresponding set of vector embeddings (say one vector per paragraph), so that when a prompt is made, the embedding of the prompt is used to query the database for the closest vector(s), and the corresponding returned document(s) are then appended to the prompt (as 'context') before being passed to the LLM.
Learn MorePrompt Engineering
Prompt engineering is the design of textual prompts for generative AI models in a way that improves the perceived quality of the generated output. It is a bit of an art, due to the black box nature of the generative models to which it applies.
Learn MoreEmergent Abilities
Emergent abilities are behaviors or properties of ML models that "emerge" unpredictably, specially as models grow in size/capacity. These abilities are said to emerge because they are somewhat unrelated to the original task the models were designed for. For example: large language models trained to predict the next token in a sequence of tokens have shown various emergent abilities in the context of few-shot prompting (when a few input-output samples are given and the model is asked to generate the output to a previously unseen input).
Learn MoreFew-Shot Prompting
Few-shot prompting is the same as few-shot learning but in the context of natural language processing (in particular conversational AIs). Another term sometimes used is "in-context learning". For example: the prompt "What's the capital of France?", can be rephrased to use few-shot prompting as "England: London; France:".
Learn MoreChain-of-Thought Prompting
Chain of thought prompting is a prompt engineering technique whereby some reasoning steps are included when introducing an input-output example before asking the model to extrapolate to another given input. For example: "Q: What's a pah-toom-pah?
A: It's a sequence of a clap, a step, and a clap, because 'pah' corresponds to a clap, and 'toom' corresponds to a step. Q: What's a pah-pah-toom-pah-pah-toom?"
Learn MoreDouble Descent
Double descent is the phenomenon in which the test error of a model (particularly deep neural networks) first decreases, then increases, then decreases again, as a function of its "size" (the number of parameters), or of the number of training epochs. This is surprising in light of the bias-variance trade-off associated with most "classical" ML models.
Learn MoreFoundation Model
A foundation model is a "large" ML model capable of serving as the foundation for many tasks, either after fine-tuning or in the context of n-shot learning ("prompting"). Large language models (LLMs) are typical foundation models, as they can be adapted for various natural language processing (NLP) tasks, such as summarization, question answering, text generation, and coding in various programming languages.
Learn MoreMachine Unlearning
Machine unlearning is the subfield of ML that deals with the problem of making a model "forget" the influence of a given subset of samples used for training -- the so called "forget set". The need to unlearn arises, for example, in case a subject withdraws consent for data to be used to train ML models. The natural solution is to retrain the model without the forget set, but that may be impractical for very large models.
Learn MorePerplexity
Perplexity is a common metric used to evaluate language models. Presented with the test set, the model is set up to predict tokens based on prior tokens (context); the better such predictions are, the less 'perplexed' (surprised) the model is at seeing the test set, and therefore the better the language model is. There are different mathematical formulations for perplexity; a typical one is the exponent of the average negative log-likelihood computed over a sliding window on the test set.
Learn More