IBM: Granite 4.0 Micro
IBM Research · open · 128K context · ₹1/M tokens · Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tune
Every frontier and open-weight model that matters — with context window, INR pricing, language support, and India availability.
IBM Research · open · 128K context · ₹1/M tokens · Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tune
Meta · open · 16K context · ₹2/M tokens · Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated s
Mistral AI · open · 128K context · ₹2/M tokens · A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, Ger
Meta · open · 59K context · ₹2/M tokens · Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual
OpenAI · closed · 128K context · ₹2/M tokens · gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B act
Liquid · open · 32K context · ₹2/M tokens · LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Ex
Alibaba (Qwen) · open-api · 128K context · ₹3/M tokens · Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.
Amazon · closed · 125K context · ₹3/M tokens · Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context len
Cohere · closed · 125K context · ₹3/M tokens · Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiri
OpenAI · closed · 128K context · ₹3/M tokens · gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose pro
Meta · open · 8K context · ₹3/M tokens · Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecas
Google · closed · 128K context · ₹3/M tokens · Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 language
Alibaba (Qwen) · open · 32K context · ₹3/M tokens · Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has grea
Sao10K · open · 8K context · ₹3/M tokens · Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with impr
NVIDIA · open · 128K context · ₹3/M tokens · NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning
Google · closed · 128K context · ₹3/M tokens · Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 language
Alibaba (Qwen) · open · 256K context · ₹3/M tokens · Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-p
Arcee Ai · open · 128K context · ₹4/M tokens · Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient rea
Alibaba (Qwen) · open · 40K context · ₹4/M tokens · Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seam
Mistral AI · open · 32K context · ₹4/M tokens · Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it featur
OpenAI · closed · 391K context · ₹4/M tokens · GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While
IBM Research · open · 128K context · ₹4/M tokens · Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and
NVIDIA · open · 256K context · ₹4/M tokens · NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems
Meta · open · 78K context · ₹4/M tokens · Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reas
Z Ai · open · 198K context · ₹5/M tokens · As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, str
Amazon · closed · 293K context · ₹5/M tokens · Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. A
Google · closed · 32K context · ₹5/M tokens · Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—inc
Alibaba (Qwen) · open · 40K context · ₹5/M tokens · Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamle
Google · closed · 256K context · ₹5/M tokens · Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token dur
Gryphe · open · 4K context · ₹5/M tokens · One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
Microsoft · closed · 16K context · ₹5/M tokens · [Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or w
Alibaba (Qwen) · open-api · 977K context · ₹5/M tokens · The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts
Tencent · open · 256K context · ₹6/M tokens · Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning lev
Baidu · open · 117K context · ₹6/M tokens · A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptional multimodal understa
Alibaba (Qwen) · open · 156K context · ₹6/M tokens · Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code genera
Baidu · open · 128K context · ₹6/M tokens · ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for top-tier performance in logical puzzles,
Alibaba (Qwen) · open · 256K context · ₹6/M tokens · Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active param
Bytedance Seed · open-api · 256K context · ₹6/M tokens · Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context wind
Mistral AI · open · 125K context · ₹6/M tokens · Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved funct
OpenAI · closed · 128K context · ₹6/M tokens · gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lowe
Google · closed · 1024K context · ₹6/M tokens · Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quali
Microsoft · closed · 125K context · ₹7/M tokens · Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning den
Alibaba (Qwen) · open · 128K context · ₹7/M tokens · Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model
Alibaba (Qwen) · open · 128K context · ₹7/M tokens · Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, an
Alibaba (Qwen) · open · 40K context · ₹7/M tokens · Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seaml
Meta · open · 320K context · ₹7/M tokens · Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It sup
Inclusionai · open-api · 256K context · ₹7/M tokens · Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that requir
Google · closed · 128K context · ₹7/M tokens · Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 language
Alibaba (Qwen) · open · 40K context · ₹8/M tokens · Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, mult
NVIDIA · open · 256K context · ₹8/M tokens · NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-a
Alibaba (Qwen) · open · 256K context · ₹8/M tokens · Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thin
Alibaba (Qwen) · open · 256K context · ₹8/M tokens · Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targe
Alibaba · open · 128K context · ₹8/M tokens · Tongyi DeepResearch is an agentic large language model developed by Tongyi Lab, with 30 billion total parameters activating only 3 billion per token. It's optim
Alibaba (Qwen) · open · 128K context · ₹8/M tokens · Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard
Bytedance Seed · open-api · 256K context · ₹8/M tokens · Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It deliver
Google · closed · 1024K context · ₹8/M tokens · Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved through
NVIDIA · open · 128K context · ₹8/M tokens · Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s
Stepfun · open · 256K context · ₹8/M tokens · Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11
Meta · open · 128K context · ₹8/M tokens · The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instr
Mistral AI · open · 31K context · ₹8/M tokens · Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It e
Rekaai · open · 16K context · ₹8/M tokens · Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized
Xiaomi · open · 256K context · ₹8/M tokens · MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parame
Bytedance · open · 125K context · ₹8/M tokens · UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. B
Google · closed · 977K context · ₹8/M tokens · Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on
Mistral AI · open · 128K context · ₹8/M tokens · The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
Rekaai · open · 64K context · ₹8/M tokens · Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks
Mistral AI · open · 128K context · ₹8/M tokens · Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Fi
OpenAI · closed · 1023K context · ₹8/M tokens · For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size wit
Google · closed · 1024K context · ₹8/M tokens · Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved through
Z Ai · open-api · 125K context · ₹8/M tokens · GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, onlin
Alibaba (Qwen) · open · 128K context · ₹9/M tokens · Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video.
Mistral AI · open · 3K context · ₹9/M tokens · A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
Alibaba (Qwen) · open · 256K context · ₹9/M tokens · Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total
Alibaba (Qwen) · open · 128K context · ₹10/M tokens · Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex s
Google · closed · 256K context · ₹11/M tokens · Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, c
Z Ai · open · 128K context · ₹11/M tokens · GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixtu
Nous Research · open · 128K context · ₹11/M tokens · Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowin
Alibaba (Qwen) · open · 128K context · ₹11/M tokens · Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhanc
Alibaba (Qwen) · open · 128K context · ₹11/M tokens · Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimi
Nex Agi · open · 128K context · ₹11/M tokens · DeepSeek V3.1 Nex-N1 is the flagship release of the Nex-N1 series — a post-trained model designed to highlight agent autonomy, tool use, and real-world producti
Alibaba (Qwen) · open-api · 128K context · ₹11/M tokens · Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high
Baidu · open · 29K context · ₹12/M tokens · A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understan
DeepSeek · open · 1024K context · ₹12/M tokens · DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-tok
Tencent · open · 128K context · ₹12/M tokens · Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoni
Alibaba (Qwen) · open · 256K context · ₹12/M tokens · The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixtur
Nous Research · open · 8K context · ₹12/M tokens · Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly
Alibaba (Qwen) · open · 128K context · ₹12/M tokens · Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B
Cohere · closed · 125K context · ₹13/M tokens · command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmented generation (RAG) and
Alibaba (Qwen) · open · 256K context · ₹13/M tokens · Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybr
Mistral AI · open · 256K context · ₹13/M tokens · A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
DeepSeek · open · 32K context · ₹13/M tokens · DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extend
Essentialai · open · 32K context · ₹13/M tokens · Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming, math, and scientific r
Mistral AI · open · 256K context · ₹13/M tokens · Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It com
OpenAI · closed · 125K context · ₹13/M tokens · GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.
Meta · open · 1024K context · ₹13/M tokens · Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts a
OpenAI · closed · 125K context · ₹13/M tokens · GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced
Allenai · open · 64K context · ₹13/M tokens · Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruction-following scenario
Upstage · open-api · 125K context · ₹13/M tokens · Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers ex
MiniMax · open · 192K context · ₹13/M tokens · MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments,
OpenAI · closed · 125K context · ₹13/M tokens · GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced
Arcee Ai · open · 128K context · ₹13/M tokens · Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters
Thedrummer · open · 32K context · ₹14/M tokens · Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices -
Meta · open · 160K context · ₹15/M tokens · Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used
Arcee Ai · open-api · 128K context · ₹15/M tokens · Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 3
Alibaba (Qwen) · open-api · 977K context · ₹16/M tokens · Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autono
Alibaba (Qwen) · open · 256K context · ₹16/M tokens · The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and
xAI · closed · 1953K context · ₹17/M tokens · Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window. Reasoning
MiniMax · open · 977K context · ₹17/M tokens · MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion paramet
xAI · closed · 1953K context · ₹17/M tokens · Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read
xAI · closed · 250K context · ₹17/M tokens · Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer G
OpenAI · closed · 391K context · ₹17/M tokens · GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and
Mistral AI · open · 256K context · ₹17/M tokens · The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counter
Prime Intellect · open · 128K context · ₹17/M tokens · INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (12B active) post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SFT) followed by large-sc
DeepSeek · open · 160K context · ₹17/M tokens · DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [Deep
Alibaba (Qwen) · open · 256K context · ₹17/M tokens · Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instru
Mistral AI · open-api · 32K context · ₹17/M tokens · Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses
Alibaba (Qwen) · open · 256K context · ₹18/M tokens · Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as
Arcee Ai · open · 256K context · ₹18/M tokens · Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, agentic workloads, and re
Meta · open · 128K context · ₹20/M tokens · Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as im
Google · closed · 1024K context · ₹21/M tokens · Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and
Inception · open-api · 125K context · ₹21/M tokens · Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and re
Bytedance Seed · open-api · 256K context · ₹21/M tokens · Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency
Google · closed · 1024K context · ₹21/M tokens · Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, a
Anthropic · closed · 195K context · ₹21/M tokens · Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcem
OpenAI · closed · 391K context · ₹21/M tokens · GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex
OpenAI · closed · 391K context · ₹21/M tokens · GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefi
Alibaba (Qwen) · open-api · 977K context · ₹21/M tokens · Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiere
Alibaba (Qwen) · open · 31K context · ₹21/M tokens · Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, g
Bytedance Seed · open-api · 256K context · ₹21/M tokens · Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context
DeepSeek · open · 128K context · ₹21/M tokens · DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduce
MiniMax · open · 192K context · ₹21/M tokens · MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 b
Alibaba (Qwen) · open-api · 977K context · ₹22/M tokens · Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
Alibaba (Qwen) · open-api · 977K context · ₹22/M tokens · The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-expe
Alibaba (Qwen) · open-api · 977K context · ₹22/M tokens · Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.
Alibaba (Qwen) · open · 256K context · ₹22/M tokens · The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-exper
Alibaba (Qwen) · open · 128K context · ₹22/M tokens · Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is o
Alibaba (Qwen) · open-api · 977K context · ₹22/M tokens · Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
DeepSeek · open · 160K context · ₹23/M tokens · DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces Deep
DeepSeek · open · 160K context · ₹23/M tokens · DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues re
Baidu · open · 120K context · ₹23/M tokens · ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters
DeepSeek · open · 160K context · ₹24/M tokens · DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performance. It builds on DeepSeek Sparse Attentio
DeepSeek · open · 32K context · ₹24/M tokens · DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek
MiniMax · open · 192K context · ₹24/M tokens · MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 b
MiniMax · open · 192K context · ₹25/M tokens · MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participat
Thedrummer · open · 128K context · ₹25/M tokens · Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.
Amazon · closed · 977K context · ₹25/M tokens · Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrate
Kwaipilot · open-api · 250K context · ₹25/M tokens · KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integra
Z Ai · open · 128K context · ₹25/M tokens · GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It su
Mistral AI · open-api · 250K context · ₹25/M tokens · Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middl
Inclusionai · open-api · 256K context · ₹25/M tokens · Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-parameter flagship, designed for real-world agents that require fast exec
Google · closed · 1024K context · ₹25/M tokens · Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It inclu
xAI · closed · 128K context · ₹25/M tokens · Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It’s id
xAI · closed · 128K context · ₹25/M tokens · A lightweight model that thinks before responding. Fast, smart, and great for logic-based tasks that do not require deep domain knowledge. The raw thinking trac
MiniMax · open-api · 64K context · ₹25/M tokens · MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed t
Google · closed · 32K context · ₹25/M tokens · Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is c
Nous Research · open · 128K context · ₹25/M tokens · Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic ca
DeepSeek · open · 160K context · ₹27/M tokens · DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on ne
Alibaba (Qwen) · open · 256K context · ₹27/M tokens · Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — a
Alibaba (Qwen) · open-api · 977K context · ₹27/M tokens · Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and h
Mistral AI · open · 125K context · ₹29/M tokens · Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provi
Alibaba (Qwen) · open · 32K context · ₹30/M tokens · Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has gre
Z Ai · open · 200K context · ₹33/M tokens · Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, en
Alibaba (Qwen) · open · 256K context · ₹33/M tokens · The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-o
Thedrummer · open · 32K context · ₹33/M tokens · UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.
Moonshot AI · open · 256K context · ₹33/M tokens · Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi
Xiaomi · open-api · 256K context · ₹33/M tokens · MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal
Mistral AI · open-api · 128K context · ₹33/M tokens · Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost
Moonshot AI · open · 256K context · ₹33/M tokens · Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI
Alibaba (Qwen) · open-api · 977K context · ₹33/M tokens · Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M t
Xiaomi · open · 1024K context · ₹33/M tokens · MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in
Meta · open · 128K context · ₹33/M tokens · Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usec
OpenAI · closed · 1023K context · ₹33/M tokens · GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context w
Mistral AI · open-api · 128K context · ₹33/M tokens · Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from
MiniMax · open-api · 977K context · ₹33/M tokens · MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (
Mistral AI · open · 256K context · ₹33/M tokens · Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256
Mistral AI · open-api · 128K context · ₹33/M tokens · Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level cap
Z Ai · open · 198K context · ₹33/M tokens · GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution.
Krutrim (Ola) · open-api · 128K context · ₹35/M tokens · Ola Krutrim's 22-language Indic foundation model. INR-priced API with developer-tier free credits. Wide Indic coverage.
Baidu · open · 120K context · ₹35/M tokens · ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token.
DeepSeek · open · 1024K context · ₹36/M tokens · DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context w
Undi95 · open · 6K context · ₹38/M tokens · A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
Alibaba (Qwen) · open · 128K context · ₹38/M tokens · Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching
Sarvam AI · open-api · 32K context · ₹40/M tokens · Indic-native foundation model from Sarvam AI (Bengaluru). INR billing, GST included, Mumbai latency under 100ms. 11 Indian languages with native-quality output.
Meta · open · 128K context · ₹40/M tokens · Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content
Arcee Ai · open-api · 32K context · ₹42/M tokens · Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been further trained on permissively‑licensed GitHub, CodeSearchNet and synthetic bug‑fi
Google · closed · 1024K context · ₹42/M tokens · Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro l
DeepSeek · open · 160K context · ₹42/M tokens · May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reaso
Google · closed · 64K context · ₹42/M tokens · Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual qual
Mistral AI · open-api · 256K context · ₹42/M tokens · Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and re
~Google · open-api · 1024K context · ₹42/M tokens · This model always redirects to the latest model in the Google Gemini Flash family.
OpenAI · closed · 16K context · ₹42/M tokens · GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. T
Meta · open · 8K context · ₹43/M tokens · Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue useca
Alibaba (Qwen) · open-api · 128K context · ₹43/M tokens · Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks
Thedrummer · open · 32K context · ₹46/M tokens · Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent stor
Moonshot AI · open · 128K context · ₹48/M tokens · Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active
Z Ai · open · 198K context · ₹50/M tokens · GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it de
OpenAI · closed · 125K context · ₹50/M tokens · A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. In
Z Ai · open · 64K context · ₹50/M tokens · GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B
Writer · open-api · 1016K context · ₹50/M tokens · Palmyra X5 is Writer's most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-leading speed and effic
Moonshot AI · open · 256K context · ₹50/M tokens · Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillio
Z Ai · open · 128K context · ₹50/M tokens · GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a
Microsoft · closed · 64K context · ₹52/M tokens · WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it con
Alibaba (Qwen) · open-api · 977K context · ₹54/M tokens · Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous progr
Sao10K · open · 128K context · ₹54/M tokens · Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3
Google · closed · 8K context · ₹54/M tokens · Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-s
Alibaba (Qwen) · open · 32K context · ₹55/M tokens · Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upo
Aion Labs · open · 128K context · ₹58/M tokens · Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, cod
DeepSeek · open · 62K context · ₹58/M tokens · DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with
DeepSeek · open · 128K context · ₹58/M tokens · DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [Dee
Moonshot AI · open · 32K context · ₹62/M tokens · Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration. It
~Moonshotai · open-api · 32K context · ₹62/M tokens · This model always redirects to the latest model in the MoonshotAI Kimi family.
Arcee Ai · open-api · 128K context · ₹63/M tokens · Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tackle cross‑domain reasoning, creative writing and enterprise QA. Unlike ma
~Openai · open-api · 391K context · ₹63/M tokens · This model always redirects to the latest model in the OpenAI GPT Mini family.
OpenAI · closed · 391K context · ₹63/M tokens · GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image input
Mancer · open-api · 8K context · ₹63/M tokens · An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.
Alibaba (Qwen) · open-api · 256K context · ₹65/M tokens · Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail
Alibaba (Qwen) · open-api · 256K context · ₹65/M tokens · Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By sig
Anthropic · closed · 195K context · ₹67/M tokens · Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick
Aion Labs · open-api · 128K context · ₹67/M tokens · Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and confl
Alfredpros · open · 4K context · ₹67/M tokens · A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract using 4-bit QLoRA finetuning provided by PEFT library.
Morph · open-api · 80K context · ₹67/M tokens · Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the foll
Amazon · closed · 293K context · ₹67/M tokens · Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of D
Aion Labs · open-api · 32K context · ₹67/M tokens · Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, whe
Relace · open-api · 250K context · ₹71/M tokens · Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and
Sao10K · open · 128K context · ₹71/M tokens · Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao1
Switchpoint · open-api · 128K context · ₹71/M tokens · Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our route
Morph · open-api · 256K context · ₹75/M tokens · Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to
Arcee Ai · open-api · 128K context · ₹75/M tokens · Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwen 2.5‑32 B tuned with DPO and chain‑of‑thought RL for step‑by‑step logic
Z Ai · open · 198K context · ₹82/M tokens · GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minu
Xiaomi · open-api · 1024K context · ₹84/M tokens · MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is hig
Nous Research · open · 128K context · ₹84/M tokens · Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can
Anthropic · closed · 195K context · ₹84/M tokens · Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude m
Nous Research · open · 128K context · ₹84/M tokens · Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi
Relace · open-api · 250K context · ₹84/M tokens · The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to R
~Anthropic · open-api · 195K context · ₹84/M tokens · This model always redirects to the latest model in the Anthropic Claude Haiku family.
Xiaomi · open · 1024K context · ₹84/M tokens · MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks, w
Perplexity · open-api · 124K context · ₹84/M tokens · Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking t
OpenAI · closed · 4K context · ₹84/M tokens · GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. T
Alibaba (Qwen) · open-api · 32K context · ₹87/M tokens · Qwen-Max, based on Qwen2.5, provides the best inference performance among [Qwen models](/qwen), especially for complex multi-step tasks. It's a large-scale MoE
Alibaba (Qwen) · open-api · 256K context · ₹87/M tokens · Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total par
OpenAI · closed · 195K context · ₹92/M tokens · OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. This model sup
OpenAI · closed · 195K context · ₹92/M tokens · OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabi
OpenAI · closed · 195K context · ₹92/M tokens · OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for
OpenAI · closed · 195K context · ₹92/M tokens · OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effort set to high. OpenAI o4-mini is a compact reasoning model in the o-seri
Z Ai · open-api · 198K context · ₹100/M tokens · GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply op
Z Ai · open-api · 198K context · ₹100/M tokens · GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, a
OpenAI · closed · 391K context · ₹104/M tokens · GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessi
OpenAI · closed · 391K context · ₹104/M tokens · GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural
OpenAI · closed · 391K context · ₹104/M tokens · GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that requi
Deepcogito · open-api · 125K context · ₹104/M tokens · Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using
xAI · closed · 977K context · ₹104/M tokens · Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and
OpenAI · closed · 391K context · ₹104/M tokens · GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions
Google · closed · 1024K context · ₹104/M tokens · Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilit
OpenAI · closed · 125K context · ₹104/M tokens · GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.
xAI · closed · 1953K context · ₹104/M tokens · Grok 4.20 is a reasoning model from xAI with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the mark
Google · closed · 1024K context · ₹104/M tokens · Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilit
OpenAI · closed · 125K context · ₹104/M tokens · GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses
OpenAI · closed · 391K context · ₹104/M tokens · GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development tasks. It is based on an updated version
Google · closed · 1024K context · ₹104/M tokens · Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilit
Sao10K · open · 8K context · ₹124/M tokens · Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awarenes
Mistral AI · open-api · 256K context · ₹125/M tokens · Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic
OpenAI · closed · 4K context · ₹125/M tokens · This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.
OpenAI · closed · 391K context · ₹146/M tokens · GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reas
OpenAI · closed · 125K context · ₹146/M tokens · GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accu
OpenAI · closed · 125K context · ₹146/M tokens · GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It use
OpenAI · closed · 391K context · ₹146/M tokens · GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development s
OpenAI · closed · 391K context · ₹146/M tokens · GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasonin
OpenAI · closed · 195K context · ₹167/M tokens · o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technica
Perplexity · open-api · 125K context · ₹167/M tokens · Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-re
Mistral AI · open · 64K context · ₹167/M tokens · Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparall
Mistral AI · open-api · 128K context · ₹167/M tokens · Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understa
Mistral AI · open-api · 128K context · ₹167/M tokens · This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSO
Perplexity · open-api · 125K context · ₹167/M tokens · Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, rea
Mistral AI · open-api · 128K context · ₹167/M tokens · Mistral Large 2 2411 is an update of [Mistral Large 2](/mistralai/mistral-large) released together with [Pixtral Large 2411](/mistralai/pixtral-large-2411) It p
OpenAI · closed · 1023K context · ₹167/M tokens · GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It support
Google · closed · 1024K context · ₹167/M tokens · Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more effici
AI21 Labs · closed · 250K context · ₹167/M tokens · Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybr
OpenAI · closed · 195K context · ₹167/M tokens · o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for tackling complex, multi-step research tasks. Note: This model always us
Google · closed · 64K context · ₹167/M tokens · Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly impr
Google · closed · 1024K context · ₹167/M tokens · Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more
Mistral AI · open-api · 125K context · ₹167/M tokens · This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, J
xAI · closed · 1953K context · ₹167/M tokens · Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep res
~Google · open-api · 1024K context · ₹167/M tokens · This model always redirects to the latest model in the Google Gemini Pro family.
Amazon · closed · 977K context · ₹209/M tokens · Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.
Inflection · open-api · 8K context · ₹209/M tokens · Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It
OpenAI · closed · 125K context · ₹209/M tokens · The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add dep
Cohere · closed · 125K context · ₹209/M tokens · command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to
Cohere · closed · 250K context · ₹209/M tokens · Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding us
OpenAI · closed · 125K context · ₹209/M tokens · GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turb
OpenAI · closed · 1025K context · ₹209/M tokens · GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K outpu
OpenAI · closed · 125K context · ₹209/M tokens · The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readabili
OpenAI · closed · 391K context · ₹209/M tokens · GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini](https://openrouter.ai/openai/gpt-5-mini), with GPT Image 1 Mini for e
OpenAI · closed · 125K context · ₹209/M tokens · The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and mainta
Inflection · open-api · 8K context · ₹209/M tokens · Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent
OpenAI · closed · 125K context · ₹209/M tokens · The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [h
OpenAI · closed · 125K context · ₹209/M tokens · GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.
xAI · closed · 250K context · ₹250/M tokens · Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note t
OpenAI · closed · 16K context · ₹250/M tokens · This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Tr
Perplexity · open-api · 195K context · ₹250/M tokens · Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reas
Perplexity · open-api · 195K context · ₹250/M tokens · Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-re
~Anthropic · open-api · 977K context · ₹250/M tokens · This model always redirects to the latest model in the Anthropic Claude Sonnet family.
xAI · closed · 128K context · ₹250/M tokens · Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possess
Anthropic · closed · 977K context · ₹250/M tokens · Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative de
xAI · closed · 128K context · ₹250/M tokens · Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possess
Anthropic · closed · 977K context · ₹250/M tokens · Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and
Sao10K · open · 16K context · ₹250/M tokens · This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).
Anthropic · closed · 977K context · ₹250/M tokens · Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performa
Anthracite Org · open · 16K context · ₹250/M tokens · This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet
Alpindale · open · 6K context · ₹313/M tokens · A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - [@chargoddard](https://huggingface
Aion Labs · open-api · 128K context · ₹334/M tokens · Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with
Anthropic · closed · 977K context · ₹418/M tokens · Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than
~Openai · open-api · 1025K context · ₹418/M tokens · This model always redirects to the latest model in the OpenAI GPT family.
OpenAI · closed · 125K context · ₹418/M tokens · GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turb
OpenAI · closed · 1025K context · ₹418/M tokens · GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved to
OpenAI · closed · 391K context · ₹418/M tokens · GPT Chat Latest points to OpenAI's stable API alias `chat-latest` that always resolves to the latest Instant chat model used in ChatGPT. As OpenAI rolls out new
~Anthropic · open-api · 977K context · ₹418/M tokens · This model always redirects to the latest model in the Claude Opus family.
Anthropic · closed · 977K context · ₹418/M tokens · Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.
Anthropic · closed · 195K context · ₹418/M tokens · Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers
OpenAI · closed · 266K context · ₹668/M tokens · [GPT-5.4](https://openrouter.ai/openai/gpt-5.4) Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2. It
OpenAI · closed · 125K context · ₹835/M tokens · The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to April 2023.
OpenAI · closed · 391K context · ₹835/M tokens · [GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improvement
OpenAI · closed · 195K context · ₹835/M tokens · o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex, multi-step research tasks. Note: This model always uses the 'web_sea
OpenAI · closed · 125K context · ₹835/M tokens · The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.
OpenAI · closed · 125K context · ₹835/M tokens · The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Dec 2023
OpenAI · closed · 391K context · ₹1,252/M tokens · GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that r
Anthropic · closed · 195K context · ₹1,252/M tokens · Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on
Anthropic · closed · 195K context · ₹1,252/M tokens · Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workf
OpenAI · closed · 195K context · ₹1,252/M tokens · The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale
OpenAI · closed · 195K context · ₹1,670/M tokens · The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to
OpenAI · closed · 391K context · ₹1,754/M tokens · GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for comp
OpenAI · closed · 1025K context · ₹2,505/M tokens · GPT-5.5 Pro is OpenAI’s high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It features a 1M+ token context windo
OpenAI · closed · 8K context · ₹2,505/M tokens · OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due t
OpenAI · closed · 1025K context · ₹2,505/M tokens · GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It
OpenAI · closed · 8K context · ₹2,505/M tokens · GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data: up to Sep 2021.
Anthropic · closed · 977K context · ₹2,505/M tokens · Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's
OpenAI · closed · 195K context · ₹12,525/M tokens · The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to
OpenRouter · open-api · 1024K context · Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in
OpenRouter · open-api · 195K context · The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smart
Google · closed · 1024K context · Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can g
Google · closed · 1024K context · 30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, yo
OpenRouter · open-api · 1953K context · The Pareto Router maintains a tiered shortlist of strong coding models, ranked by [Artificial Analysis](https://artificialanalysis.ai/) coding percentiles. Set
MeitY · Bhashini · open-api · 4K context · Government of India's national-mission translation service covering 22 Indian languages. Free tier for non-commercial; commercial usage via Bhashadaan partner n
OpenRouter · open-api · 125K context · Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI models, and Body Builder w
OpenRouter · open-api · 1953K context · Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was