// news · open-source · model2026-05-19source: z.ai

Z.ai GLM-5.1 ships at $0.18 per million input tokens with 1M context

Z.ai released GLM-5.1 with 1 million token context and inference pricing of $0.18 per million input tokens — undercutting DeepSeek V4 Flash ($0.14) only narrowly while matching it on SWE-bench Verified at 76.4%.

GLM-5.1 uses a sparse mixture-of-experts architecture similar to DeepSeek V3, with ~600B total parameters and ~38B active per token. The weights are on Hugging Face under a license that explicitly permits commercial use including derivatives.

The 1M context is delivered without quality cliffs at depth — Z.ai's own internal "needle in haystack" evaluations show 95%+ retrieval up to 800K tokens, falling to 87% near the 1M ceiling. That's competitive with Claude 4.5 (the previous best) on the same evaluation, and the inference cost is roughly 1/15th the price.

Z.ai — GLM-5.1 release → · llm-stats →