Action · guide · May 17, 2026

Five open-weight models worth pulling onto an Indian self-hosted GPU stack

NVIDIA Nemotron 3 Super, Nemotron 3 Nano 30B A3B, Nemotron Nano 9B V2, Llama 3.3 Nemotron Super 49B V1.5, and Mistral Nemo cover the size band an Indian team can realistically host on rented or owned accelerators.

Indian builders self-hosting LLMs on consumer GPUs · By The ShiftMaker Editorial Desk

Self-hosting is back on the menu for Indian teams in 2026. The reasons are blunt: hosted-model invoices arrive in dollars, GPU rental in INR has stabilised at a survivable rate, and the open-weight side of the model market has finally produced models that hold up against the closed leaders on real workloads. The honest question for an engineering lead is no longer 'should we self-host' but 'which weights do we pull, and what size class does the GPU budget actually clear.' This article maps five open-weight models against that decision, drawn from the publication's catalogue of explore_models rows.

The shortlist below is dominated by NVIDIA Nemotron variants. That is not advocacy; it is what the substrate looks like today. NVIDIA's open-weight push in late 2025 and early 2026 produced a tight family of models across the 9B-to-70B+ band, which happens to be the band Indian self-hosters can actually afford to rent or own. Mistral Nemo rounds out the list as the non-NVIDIA dense option in the same size class.

How we picked these

Picks were drawn from the explore_models table filtered to status=active and updated within the last sixty days. Keyword filter targeted open-weight, self-host, and gpu. Five rows survived the cut. The article covers all five rather than padding with closed-model rows that happen to share a tag — closed weights are not self-hostable, no matter how permissively the vendor talks about them in a launch post.

The list

01 NVIDIA: Nemotron 3 Super

Nemotron 3 Super is the larger of NVIDIA's Nemotron 3 family entries in the catalogue — the right choice when the GPU budget clears the upper size class and the workload needs reasoning headroom.

Why it makes the list — The catalogue lists indic_support as 'none' for this entry, which is the honest signal: this is a strong English-and-code workhorse, not a multilingual model. For Indian self-hosters whose product is in English or code, that is fine; for Indic-language workloads it is the wrong pick.

When to use
Reach for this when the team has the accelerators for the upper size class and the workload is reasoning-heavy English or code rather than Indic chat.

When NOT to use
Skip it for Indic-language production. The catalogue's indic_support flag is honest; do not paper over it with a custom evaluation set.

Pricing — Open weights, so zero per-token fee from the vendor. The cost line is the GPU it runs on.

Closest alternative — For a smaller variant in the same family with the same trade-offs, Nemotron 3 Nano 30B A3B drops the parameter count without changing the language posture.

02 NVIDIA: Nemotron 3 Nano 30B A3B

Nemotron 3 Nano 30B A3B sits in the catalogue as the mid-band entry from the Nemotron 3 family — sparse-activation architecture, 30B total.

Why it makes the list — Catalogue lists indic_support as 'none'. The value proposition is the sparse activation: fewer parameters lit per token means the model runs on smaller GPU rigs than its dense equivalent, which is the constraint Indian rented-GPU users actually face.

When to use
Pick this when the GPU rig is mid-band — a single mid-tier data-centre card — and the workload is English or code reasoning. The sparse design is the feature, not a curiosity.

When NOT to use
Avoid this for Indic-language production and for workloads that need a dense model's behavioural consistency under unusual prompts.

Pricing — Open weights, no per-token fee. The cost line is the GPU.

Closest alternative — For a smaller dense option in the same neighbourhood, Mistral Nemo is the non-NVIDIA pick.

03 NVIDIA: Nemotron Nano 9B V2

Nemotron Nano 9B V2 is the smallest of the Nemotron entries in the catalogue — the entry-level self-host option for teams running on a single consumer or workstation GPU.

Why it makes the list — Catalogue lists indic_support as 'none'. The model's value is reach: a 9B parameter model runs on hardware that an Indian engineering team can actually own outright, rather than rent.

When to use
Use this for prototypes, on-device inference, and edge deployments where owning the GPU outright matters more than top-of-leaderboard scores.

When NOT to use
Skip it when the workload needs the reasoning headroom of a larger model and the team can afford the upgrade in GPU spend.

Pricing — Open weights. The smallest member of the family has the lowest GPU floor — the budget conversation here is hardware, not licence.

Closest alternative — Mistral Nemo is the non-NVIDIA option in roughly the same size class with different training data and a different licence.

04 Mistral: Mistral Nemo

Mistral Nemo is the catalogue's non-NVIDIA dense option in the self-hostable size band — the pick for teams that want a second vendor in the open-weight stack.

Why it makes the list — Catalogue lists indic_support as 'limited' rather than 'none', which is the honest tier above the Nemotron family for any team whose product needs to handle some non-English input — limited is not full, but it is more than none.

When to use
Pick this when vendor diversity matters, when limited Indic coverage is enough for the workload, or when Mistral's licence terms fit a deployment shape the NVIDIA licence does not.

When NOT to use
Skip it for workloads where full Indic-language fluency is the bar; the limited tag is honest.

Pricing — Open weights. Vendor diversification has no incremental cost; the GPU is still the bill.

Closest alternative — For a smaller NVIDIA option in the same neighbourhood, Nemotron Nano 9B V2 is the cheapest to host.

05 NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Llama 3.3 Nemotron Super 49B V1.5 is the catalogue's Llama-derived NVIDIA tuning — the right pick when the team wants Llama lineage with Nemotron's reasoning tuning on top.

Why it makes the list — Catalogue lists indic_support as 'none'. The lineage matters here: teams that have already standardised on Llama prompt and tooling conventions can swap the underlying weights without retraining their entire surface.

When to use
Adopt this when the engineering team is already shipping on Llama and wants better reasoning without breaking the prompt and tool conventions the codebase is built around.

When NOT to use
Skip it for new self-hosting projects with no existing Llama posture; in that case the plain Nemotron 3 Super is a cleaner starting point.

Pricing — Open weights. 49B parameter count puts the GPU bill in the upper-mid band — high-end data-centre accelerator territory for serving.

Closest alternative — the standard Nemotron 3 Super is the non-Llama-lineage NVIDIA alternative in the same size band.

Side-by-side

The five split into three size bands. 9B for the consumer-or-workstation GPU team. 30B-to-49B for the rented mid-band single-GPU team. Upper-band Nemotron 3 Super for teams with top-tier accelerator access. Mistral Nemo is the non-NVIDIA hedge in the mid-band slot.

Item	indic_support	description
Mistral: Mistral Nemo	limited	A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...
NVIDIA: Nemotron 3 Super	none	NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...
NVIDIA: Nemotron Nano 9B V2	none	NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...
NVIDIA: Nemotron 3 Nano 30B A3B	none	NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5	none	Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...

India context

Four of the five entries are tagged indic_support='none' in the catalogue and one is tagged 'limited'. That matters for Indian teams: a self-hosting decision driven by INR cost still has to live with the model's language coverage, and no amount of GPU saving compensates for a model that mishandles Hindi, Tamil or Bengali production traffic. For English-only or code-only workloads the picks above are fine; for Indic-language workloads a different shortlist is needed.

How to decide

If the workload is English-or-code and the budget is the upper band, start with Nemotron 3 Super. If the budget is the mid-band single GPU, start with Nemotron 3 Nano 30B A3B or Mistral Nemo. If the team is running on owned consumer hardware, start with Nemotron Nano 9B V2. If the codebase is already on Llama, the Nemotron-tuned Llama 3.3 49B V1.5 is the drop-in upgrade.

Gotchas

Three patterns to watch. First: open weights do not mean open licence — read the vendor's terms before shipping commercial product, NVIDIA and Mistral have different posture there. Second: the GPU bill is the real cost line, and that bill is in INR for rented-GPU Indian users which is the entire reason self-hosting is back on the menu. Third: indic_support is the catalogue's honest tag, not a marketing claim — a model marked 'none' should not be deployed to Indic-language production traffic and patched in production.

Pick one of these five, run it for thirty days, log per-rupee performance, then decide whether to keep self-hosting or move back to a hosted API. That is the only honest measurement for a self-hosting decision and the substrate above is what to start with.