AI/ML

MiniMax M3 Explained: Why This Open-Weight AI Model Is Making Headlines in 2026

image
  • image
    Chirag Pipaliya
    Author
    • Twitter Logo
    • Linkedin Logo
    • icon
  • icon
    Jun 2, 2026

The short version

  • MiniMax M3 launched on June 1, 2026. It is an open-weight AI model from MiniMax, a lab based in Shanghai.
  • It is pitched as the first open model to pack three things into one system: frontier coding, a 1-million-token context window, and native image and video input.
  • It scored 59.0% on SWE-Bench Pro. That edges out GPT-5.5, but sits below Claude Opus 4.8.
  • Standard pricing is $0.60 per million input tokens and $2.40 per million output tokens. That is a fraction of what closed models charge.
  • The benchmark scores come from MiniMax itself. Test it on your own work before you trust it in production.

MiniMax M3 is an open-weight AI model that runs frontier-level coding, holds a million tokens of context, and reads images and video, all at once. MiniMax released it on June 1, 2026, and the AI world started talking the same day. The reason is simple. Models that do one of those things well are common. A model you can download and host yourself that aims to do all three is not. If your team is weighing which model to build on, our AI development services team has been tracking releases like this closely, and M3 is worth a careful look.

This post breaks down what M3 is, what it can and cannot do, what it costs, and how it compares to GPT-5.5. We will also be honest about the parts that need more proof.

What Is MiniMax M3?

MiniMax M3 is a large language model with open weights, released by the Shanghai lab MiniMax. “Open weights” means the model files can be downloaded and run on your own hardware, instead of being locked behind a company’s API.

At launch, the weights were not out yet. MiniMax said it would publish them, plus a full technical report, on Hugging Face and GitHub within about ten days. So the model you read about today becomes something you can host yourself soon after. You can read The Decoder’s launch coverage for more on the rollout.

The pitch is that M3 is the first open model to fuse three frontier features in a single design. Most open models pick one strength. M3 wants all three.

MiniMax M3 Features

The MiniMax M3 features that matter most are its coding skill, its huge context window, its multimodal input, and the architecture that makes the whole thing affordable.

 

A 1-million-token context window

M3 can take in up to 1,000,000 tokens at once. That is enough to hold a full mid-sized codebase, a long legal file, or hours of chat logs in active memory. Output can run up to 512,000 tokens.

Native multimodality

M3 reads text, images, and video as input. MiniMax also showed it operating a desktop computer on its own. That makes it a fit for agent-style tasks, not just chat.

MiniMax Sparse Attention (MSA)

The clever part is the architecture, called MiniMax Sparse Attention. Instead of paying attention to every token, MSA picks only the relevant blocks of context and processes those. A small index branch decides what matters, and the main attention layer handles just that slice.

The payoff is speed and cost. At a million tokens, MSA uses about one-twentieth of the per-token compute of the prior MiniMax generation. MiniMax reports prefill (reading the prompt) runs about 9.7 times faster, and decoding (writing the answer) about 15.6 times faster. Those figures are from MiniMax and have not been checked by outside labs yet.

MiniMax M3 Benchmark

On its own tests, the headline MiniMax M3 benchmark is 59.0% on SWE-Bench Pro, a hard, real-world coding test. Here is the fuller picture MiniMax shared:

Benchmark

What it measures

M3 score

SWE-Bench Pro

Real-world software fixes

59.0%

Terminal-Bench 2.1

Command-line agent tasks

66.0%

SWE-fficiency

Efficient code changes

34.8%

KernelBench Hard

Low-level GPU kernels

28.8%

BrowseComp

Autonomous web browsing

83.5

Now the caveats, and they are real. These scores come from MiniMax, run on its own setup. Independent results from third parties were not out at launch. MiniMax also compared M3 against Claude Opus 4.7, even though Anthropic had already shipped Opus 4.8 a week earlier. Against the newer Opus 4.8, the coding gap is wider. Tech Times covered this gap in detail.

Plain advice: treat launch benchmarks as a hint, not a verdict. Build a small test set from your own real tasks and run M3 against it before you commit.

MiniMax M3 vs GPT-5.5

On the question of MiniMax M3 vs GPT-5.5, the answer depends on what you value. M3 wins on price, on context size, and on the fact that you can host it yourself. GPT-5.5 is a closed, polished product with broad, proven results.

 

On SWE-Bench Pro, M3’s 59.0% edges past GPT-5.5’s reported 58.6%. That is close, and it is one test. M3’s bigger advantages are the 1-million-token window and a much lower price. The catch is maturity. GPT-5.5 has a long track record and wide tooling support. M3 is days old, and some of its claims still need outside proof.

If your workload is heavy on coding or long documents and you watch costs, M3 deserves a trial. If you need a settled, well-supported model for mixed work, GPT-5.5 is the safer default for now. A short discovery session, like the ones our innovation consulting team runs, can help you score both against your own goals before you pick.

MiniMax M3 Pricing

The MiniMax M3 pricing is the part that turned heads. Standard rates are $0.60 per million input tokens and $2.40 per million output tokens.

For the first week after launch, MiniMax cut that in half: $0.30 input and $1.20 output per million tokens. That puts M3 at roughly one-fifteenth to one-twenty-fifth the operating cost of closed frontier models. VentureBeat reported M3 reaching that performance tier at 5 to 10 percent of the usual cost.

There are also monthly token plans, around $20, $50, and $120. And once the open weights ship, self-hosting removes the per-token bill entirely, though you then pay for your own hardware.

Here is why the price matters in practice. A feature that costs over a cent per call on a premium model can cost a fraction of that on M3. Cheap features that were not worth shipping suddenly pencil out.

AI Compliance and What Open Weights Mean for Your Data

Open weights change the data story. When you self-host M3, your prompts and your customers’ data stay on your own infrastructure. Nothing has to travel to a third-party API. For teams handling regulated data, that is a big deal.

This helps with rules like GDPR, where data residency and control matter. Healthcare and finance teams who must keep records in-house gain a real option. Building that kind of setup safely is its own project, and it is the sort of work our AI development company handles end to end.

Two cautions, though. First, the hosted MiniMax API is run by a company based in China. China’s 2017 National Intelligence Law can require local firms to cooperate with the government. For sensitive data on the hosted API, that belongs in your risk review. Self-hosting the open weights sidesteps most of that concern.

Second, check the license before you build a commercial product on M3. An earlier MiniMax model restricted commercial use without written permission. At launch, M3’s final license terms were not confirmed. Read the actual license when the weights drop, not the headline.

Frequently asked questions

The API is paid, starting at $0.60 per million input tokens. The weights are set to be released for download within about ten days of launch. Even then, “downloadable” does not always mean “free for commercial use,” so read the license.
On MiniMax’s own SWE-Bench Pro test, M3 (59.0%) narrowly beat GPT-5.5 (58.6%). M3 also wins on price and context size. But the scores are vendor-run, and GPT-5.5 is more mature. For most teams, the honest answer is “test both on your own tasks.
That is the plan. MiniMax committed to publishing the open weights on Hugging Face and GitHub. Once they are out, you can run M3 on your own hardware and keep your data in-house.
MiniMax said within roughly ten days of the June 1, 2026 launch. Check Hugging Face and GitHub for the official drop and the technical report.
Its MiniMax Sparse Attention design. By processing only the relevant blocks of a long context, it cuts per-token compute to about one-twentieth of the previous generation. Less compute means a lower price.
It can be, if you self-host the open weights so data never leaves your systems. The hosted API needs more scrutiny because of where the company operates. Run a compliance review either way.