What is Activation function?
A small mathematical function applied inside neural networks that decides how much signal passes from one layer to the next. ReLU, GELU, SiLU are common ones.
A small mathematical function applied inside neural networks that decides how much signal passes from one layer to the next. ReLU, GELU, SiLU are common ones.
A small trainable add-on to a frozen base model. Lets you specialise a model without retraining the whole thing. LoRA is the dominant adapter style.
A program that takes goals and breaks them into actions, calling tools and revising its plan based on results — instead of just answering one prompt at a time.
The work of making models do what humans actually want, not just what they were trained on. RLHF, constitutional AI, and DPO are alignment techniques.
The AWS region in Mumbai. When AI APIs land here, latency from Indian users drops sharply — typically 200-300ms to 30-50ms.
A token that authenticates your account with an AI provider. Treat it like a password — committed to GitHub once means rotate immediately.
A mechanism letting a model weigh different parts of its input when producing each output token. The "T" in transformer.
A model that generates output one token at a time, conditioned on everything it produced before. All modern LLMs are autoregressive.
The training algorithm that tells a neural network how to adjust its weights based on how wrong its output was.
How many examples a model sees at once during training or inference. Larger batches train faster but need more memory.
A standardised test used to compare models — MMLU, HumanEval, MMLU-Pro. Useful for direction; often gamed in detail.
An early transformer (2018) that reads sentences both directions. Mostly retired for new work but still common in production search.
When a model systematically favours certain outputs over others — a property of training data more than design. Hard to remove fully.
A technique where the provider remembers the prefix of a long prompt across calls, charging you less the second time. Cuts repeat costs by 50-90%.
Telling a model to "think step by step" before answering. Improves reasoning on multi-step problems at the cost of more tokens.
A model that puts an input in one of N labels. Cheaper and more reliable than free-form generation for routing or tagging.
Anthropic's family of large language models. Strong at coding, agents, and following instructions.
Anthropic's CLI for coding tasks. Gives Claude direct filesystem access, command execution, and a loop.
A sandboxed Python environment a model can call to actually run code instead of just describing it. Better for math, parsing, plotting.
When a self-hosted model has to load weights into VRAM before serving the first request. Adds latency. Avoidable with always-on workers.
The text a model produces in response to a prompt. The original term, before "chat" took over.
Anthropic's training method where the model critiques its own outputs against a set of principles, then trains on those critiques.
When a model's attention degrades as the conversation gets longer, losing earlier details. Real but underdiscussed.
The maximum number of tokens (roughly: words + punctuation) a model can read in one conversation. Bigger context = more documents fit in.
An AI code editor — forked VSCode with multi-file context and inline edits.
A short document describing what a dataset contains, how it was collected, and what biases it carries. Should be standard; often missing.
A Chinese AI lab that released competitive open-weight models (DeepSeek-V3, R1) much cheaper than Western incumbents.
The work of getting a trained model into a place where users can actually call it. Often harder than the training.
Training a smaller model to mimic a larger one. Used to make cheap models that match expensive models on specific tasks.
India's Digital Personal Data Protection Act. Affects how Indian companies handle personal data sent to AI models, especially overseas.
Direct Preference Optimization. A simpler alternative to RLHF — trains directly on preference pairs without a separate reward model.
A vector representation of text (or image, audio) that lets a computer compare meaning, not just words. The math underneath search and RAG.
A specialised model whose job is to produce embedding vectors for text. Different from the LLM you're probably calling.
The half of a transformer that reads an input and turns it into vectors. BERT is encoder-only. GPT is decoder-only.
A test that scores model output against expected behaviour. Production-grade AI needs them; vibe-checking does not scale.
Including a few worked examples in the prompt to teach the model the output format. No training needed.
Further training a pre-trained model on a smaller, specific dataset to specialise it for a task or voice.
A large pre-trained model that can be adapted to many tasks. GPT-4, Claude, Gemini, Llama are foundation models.
4-bit floating-point precision for model weights. ¼ the memory of FP16. Used to fit big models into small GPUs with minimal quality loss.
A structured way of getting a model to output a function name and arguments instead of free text. Reliable tool use.
Google DeepMind's family of multimodal models. Strong on long context and image/video understanding.
Umbrella term for AI that produces new content (text, images, audio, code). Most current AI hype is generative AI.
OpenAI's family of large language models. GPT-3, GPT-4, GPT-5 are major eras.
A chip originally for graphics, now the main workhorse for AI training and inference. NVIDIA dominates; AMD and others trail.
How much each weight in a model should change to reduce loss. The thing backpropagation computes.
Group Relative Policy Optimization. A newer alignment training method, faster and simpler than PPO. Used in DeepSeek-R1.
A check layered around a model that blocks unsafe inputs or outputs. The seatbelt on top of the safety training.
When a model produces output that sounds confident but is factually wrong or invented. The defining failure mode of 2024-26 LLMs.
A model you call via an API run by someone else. The opposite of self-hosted.
The dominant model hub for open-source AI. Hosts models, datasets, demos. Also the Transformers and PEFT libraries.
India's national AI policy initiative. Funds compute, datasets, applications, and skilling for Indian AI builders.
Natural-language processing tools and models specifically designed for Indian languages — handling Devanagari, transliteration, code-mixing.
Running a trained model on new inputs to get outputs. Most of what builders care about — training is upstream.
The per-token cost of running a model. Quoted as ₹/M-tokens or $/M-tokens, input and output separately.
Fine-tuning a model on input/output pairs framed as instructions. The bridge from raw pre-trained models to usable chatbots.
A prompt crafted to bypass a model's safety training and get it to generate restricted content.
A setting that constrains the model to output valid JSON. Useful for structured extraction and pipeline integration.
Training a small model to mimic a large one. Cheaper to run, surprisingly close in quality on many tasks.
A structured network of entities and relationships. Sometimes paired with LLMs for grounding and reasoning.
Time between sending a request and getting a response. For LLM APIs, includes time-to-first-token and tokens-per-second.
Meta's family of open-weight LLMs. Llama 4 released in 2025. The most-deployed open model in production.
A neural network trained on massive text data to predict the next token. Powers most modern AI assistants.
A community benchmark that ranks models by head-to-head human votes on real prompts.
Low-Rank Adaptation. A fine-tuning technique that trains small adapter layers instead of the full model.
A number that says how wrong a model's output was. Training tries to make it smaller.
Model Context Protocol. Anthropic's open protocol for exposing tools, resources, and data sources to LLMs in a standard way.
India's Ministry of Electronics and Information Technology. The main regulator for AI-related policy in India.
Some products let an assistant remember facts across sessions. Almost always implemented as RAG over a user profile, not real model memory.
A Paris-based AI lab. Known for permissively-licensed open-weight models — Mistral 7B, Mixtral 8x7B, Codestral.
A model architecture where only a subset of parameters fire per token. Large capacity, lower inference cost.
The kind of input or output — text, image, audio, video. A "multimodal" model handles more than one.
Mixture of Experts. A model architecture where each token activates only a small slice of the parameters — letting big models run cheaply.
Multi-Token Prediction. Generating several tokens per forward pass instead of one. Big speedup at inference time.
A model that handles more than one input type. GPT-4o, Gemini, Claude Sonnet are multimodal.
NVIDIA Inference Microservices. NVIDIA's container-packaged hosted models for enterprise deployment.
When a model's parameters are released publicly so anyone can run it locally. Different from open source (where the training code is also released).
The company behind GPT, ChatGPT, and Sora. Originally non-profit, now structurally for-profit.
A platform that exposes hundreds of hosted models behind a single OpenAI-compatible API. Useful for testing across providers.
The tokens the model generates in response. Almost always priced higher than input tokens — sometimes 3-5×.
A single number inside a neural network's weights. Large models have billions to trillions of them.
Both a metric (how surprised a model is by text) and a company (the AI search startup). Context tells you which.
The initial training phase on massive general text. Produces a base model that knows language but does not yet follow instructions.
The input you give a model. Includes instructions, examples, and the actual question. Shape determines quality.
The craft of writing prompts that get the model to do what you want. Less of a job title than it used to be.
When an attacker embeds instructions in user-supplied content that hijack the model's behaviour. The XSS of LLM apps.
Compressing model weights from 16-bit to 8/4/3 bits to save memory at inference. Trades small quality drop for big VRAM savings.
Alibaba's family of open-weight models. Strong multilingual coverage, especially Chinese and Japanese.
Retrieval-Augmented Generation. Fetching relevant documents from a database and giving them to the model alongside the user's question.
A model trained or prompted to think through problems with chain-of-thought before answering. o1, R1, Claude with extended thinking.
A training paradigm where the model learns from rewards over time. The "RL" in RLHF.
A platform that hosts open-source models behind an API. Pay-per-second usage. Popular for image/video models.
Reinforcement Learning from Human Feedback. Humans rate model outputs; those ratings train the model to prefer better responses.
How a model picks each next token from its probability distribution. Temperature, top-p, top-k are sampling parameters.
Empirical patterns showing how model performance improves with more parameters, data, and compute. Drives the "bigger is better" investment story.
A model you run on your own infrastructure instead of calling a provider. More control, more maintenance.
OpenAI's text-to-video model. Generates short clips from text prompts.
State-of-the-art. The current best published result on a benchmark or task. Used loosely.
Models that turn audio into text. Whisper is the open-source standard; commercial ASR (Deepgram, AssemblyAI) competes on Indian accents.
Returning model output token-by-token as it's generated. Feels faster, even if total time is the same.
Breaking words into smaller pieces — "tokenization" might become "token" + "ization". The default tokenizer behaviour.
When a model agrees with the user instead of pushing back, even when the user is wrong. Anthropic now measures and trains against this.
A persistent instruction that sets the model's role, tone, or constraints across an entire conversation.
A parameter controlling randomness in model output. Lower (0-0.3) for factual tasks, higher (0.7-1) for creative writing.
A multi-dimensional array — the data structure neural networks operate on. Vectors and matrices are 1D and 2D tensors.
Spending more compute at inference (longer chains of thought, multiple samples, search) instead of training. The shift from o1 onwards.
How many requests or tokens a system handles per second. Different from latency, which is per-request.
The unit a model processes — usually a word fragment. ~750 words = 1000 tokens. Pricing is per million tokens.
How fast a model generates output tokens after the first one. Higher is better; varies by model and infrastructure.
When a model is given access to external functions and decides when to call them. The substrate of agents.
An alternative to temperature — the model only considers tokens whose cumulative probability sums to p. Common default: 0.9.
The neural network architecture behind nearly every modern LLM. Built around the attention mechanism.
A database that stores and searches embeddings. Used in RAG to find documents semantically similar to a query.
The memory on a GPU. Model size + activations + batch size must fit, or you get out-of-memory errors.
The trained numbers inside a neural network. When people say "open weights", they mean the weights are downloadable.
OpenAI's open-source speech-to-text model. The default starting point for transcription.
Elon Musk's AI lab. Makes Grok. Distinct from "explainable AI", which is a different field.
Common configuration format for training and inference pipelines. Easier than JSON for humans; common in Hugging Face code.
Asking a model to do a task with just an instruction — no examples. Works for common tasks; fails on niche formats.