China’s Domestic Large Model API Calls Surpass the United States: Silicon Valley Developers Are “Genuinely Impressed”

The latest data from OpenRouter shows that Chinese open-source weight models now account for 60% of usage in the global developer ecosystem. Models such as DeepSeek, Qwen, and GLM are rewriting the rules of global AI competition. Behind this lies the coordinated drive of a cost-performance revolution, engineering innovation capability, and an open-source ecosystem.

A Set of Data That Stunned the AI World

A set of data from OpenRouter — the world’s largest AI model API aggregation platform — stunned the entire AI industry in early 2026.

During the week of February 16 to 22, Chinese model weekly API call volume reached 5.16 trillion Tokens, representing a 61% share globally and surpassing American models for the first time (which recorded 2.7 trillion Tokens in the same period). Of the top five models by call volume, China occupied four slots — MiniMax M2.5, Moonshot AI’s Kimi K2.5, DeepSeek V3.2, and Zhipu GLM-5.

This trend has not stopped. According to continued tracking by Daily Economic News citing OpenRouter data, as of late June, China’s AI large model weekly call volume had maintained the global number-one position for nine consecutive weeks. In its report “The State of AI Economy 2026,” Exponential View noted that the share of Token requests going to American models had fallen from 72% a year ago to 33%.

The user composition is even more interesting. Some 47.17% of OpenRouter’s users are based in the United States, while Chinese developers account for only 6.01%. The driving force behind Chinese models reaching the top is not the domestic market engaged in “self-congratulation,” but overseas developers in Silicon Valley and Europe. The motivating force behind this is a cost-performance revolution.

The Price Gap: The Cost-Performance Revolution of Chinese Domestic Models

The core competitive advantage of Chinese domestic models overseas stems primarily from their extreme cost-performance ratio. In terms of API output pricing, MiniMax M2.5 and Zhipu GLM-5 are priced at USD 0.3 to 0.5 per million Tokens; Anthropic’s Claude Opus 4.6 is priced at USD 5 — a difference of approximately 10 to 16.7 times. OpenAI’s GPT-5.4 has an output price of approximately USD 14 to 15 per million Tokens, which is 11.7 to 28 times the price of Chinese models.

DeepSeek V4 Flash’s output price is as low as USD 0.28 per million Tokens, while OpenAI’s GPT-5.5 output price is as high as USD 30 per million Tokens — a difference of more than 107 times. Alibaba’s Qwen 3.5, released at the end of February, has pushed the price per million Tokens down to RMB 0.8 (approximately USD 0.11), equivalent to one-eighteenth of Google Gemini 3.5 Flash.

But cheap does not mean weak. On the authoritative SWE-bench Verified code capability evaluation, several mainstream Chinese large models — including MiniMax M3 (80.5%), Qwen3.7 Max (80.4%), Kimi K2.6 (80.2%), and DeepSeek V4 Pro Max (80.6%) — have all entered the 80-point club, essentially on par with Google Gemini 3.1 Pro (80.6%), yet at API prices that are only one-fifth to one-tenth of the latter.

Data from the “State of AI” report jointly released by venture capital firm a16z and model aggregation platform OpenRouter shows that the market share of open-source models developed in China surged from 1.2% at the end of 2024 to a peak of nearly 30% in mid-2025, with an annual average share of 13.0% — nearly on par with the 13.7% share of open-source models from the rest of the world. This is a structural shift, not a temporary fluctuation in rankings.

Three Layers of Cost Barriers: Electricity, Engineering, and the Open-Source Ecosystem

This price advantage did not materialize from thin air; it is underpinned by the dual support of electricity and engineering. For example, electricity costs account for 60% to 70% of total operating costs at intelligent computing centers. Green energy contract prices in China’s central and western regions are as low as RMB 0.13 to 0.3 per kilowatt-hour (approximately USD 0.018 to 0.042) — only one-third to one-fifth of European and American electricity prices. Combined with China’s large industrial electricity base, which allows full use of off-peak electricity for model training, this constitutes a physical cost moat for Chinese AI enterprises.

On the other side is an engineering capability that has been forged through necessity. Since April 2024, Chinese AI companies have been operating under conditions of advanced chip supply cuts; unable to obtain the best cards, they have pushed the cards in hand to their limits. Technical approaches exemplified by the Mixture of Experts (MoE) architecture allow models to activate only a subset of “expert networks” when handling simple tasks, significantly reducing inference costs. DeepSeek V4 Pro has a total parameter count of 1.6 trillion, but only 49 billion activated parameters; Xiaomi’s MiMo V2.5 has topped OpenRouter’s call volume rankings for multiple consecutive weeks, likewise relying on the MoE architecture for efficient inference.

The third layer of barriers is the virtuous cycle of the open-source ecosystem. Stanford reports show that from August 2024 to August 2025, Chinese developers contributed 17.1% of total downloads on Hugging Face, slightly higher than the United States’ 15.8%. Open-sourcing lowers the barrier to entry for global developers, while also allowing Chinese models to iterate rapidly through continuous technical feedback. As Silicon Valley investor Aditya Agarwal put it: “More than 50% of large model API calls are completed through inexpensive open-source models — Chinese models are in fact supporting the majority of AI applications, and American counterparts simply cannot replace them.”

Coding and Agents: The New Decisive Battlefield

OpenRouter’s data also reveals a key trend: the explosion in model call volume is highly correlated with two major application scenarios — coding and AI agents. From early 2025 to early 2026, the share of coding-related tasks in OpenRouter’s total Token volume surged from 11% to more than 50%.

This means that the competitive focus of large models has already shifted from “chatbots” to “productivity tools.” When Tokens shift from being a “cost of conversation” to being “execution fuel,” the real battlefield is not in benchmark rankings but in the IDE of every developer around the world.

Chinese models have happened to hit precisely these two breakout points. Domestic models with high call volume rankings — including MiniMax M2.5, Kimi K2.5, and Zhipu GLM-5 — all focus on enhancing coding capability and automating agent tasks. Moonshot AI’s Kimi K2.5, through parallel collaboration of up to 100 “Agent avatars,” has improved the efficiency of complex tasks by three to ten times. Zhipu GLM-5.2 achieves a 1M lossless context window and ranked first among available models in Code Arena’s global blind test with one million users.

OpenRouter co-founder and COO Chris Clark publicly stated in February 2026 that Chinese open-source models account for a “disproportionately high” share in Agent workflows run by American enterprises.

Token Export: From “Selling Apps” to “Selling Computing Power”

China’s overseas AI expansion is undergoing a structural upgrade from “application export” to “computing power export.” In the early period, the mainstream overseas model was “application export” — packaging AI capabilities into apps and directly reaching overseas users. ByteDance’s Gauthmath captured a 47% share of the U.S. photo-search market, while MiniMax’s emotional companion application Talkie covers more than 200 countries worldwide. This type of consumer-facing product accounts for more than 70% of MiniMax’s total revenue, and the M2 series text models’ daily average Token consumption in February 2026 was more than six times that of December 2025.

The latest trend is “computing power export” — delivering computing power through API pipelines, turning it into a utility like water and electricity. Overseas developers call Chinese model APIs through aggregation platforms such as OpenRouter, with inference completed in Chinese domestic data centers, billed per Token. Computing power does not leave China’s borders. Electricity does not leave China’s borders. Only value is delivered cross-border through Tokens.

The team at Moonshot AI responsible for API services has recently undergone rapid expansion and now reports directly to President Zhang Yutong as an independent business division — a clear sign that the strategic importance of the overseas API business is rising rapidly within the organizational structure. JPMorgan predicts that China’s Token consumption compound annual growth rate from 2025 to 2030 will reach 330%, with a 370-fold increase over the next five years.

The continued dominance of China’s domestic large model call volume marks the entry of China’s AI industry into a new phase in the global competitive landscape. The competitive dimension has shifted from laboratory benchmark scores to real-world usage retention and workload matching capability. The competitive actors have shifted from a small number of leading models to the ecosystem competition of multi-model portfolio invocation. The competitive geography has shifted from the domestic market to a systematic contest for the global developer ecosystem.

When 60% of call volume becomes the norm rather than the peak, and when Token exports begin to drive upgrades across the entire electricity, computing power, and chip supply chain, China’s large models are defining new standards of cost and efficiency for global AI infrastructure. This is a structural inflection point, not a momentary flash of excitement.

[Disclaimer]: The above content reflects analysis of publicly available information, expert insights, and BCC research. It does not constitute investment advice. BCC is not responsible for any losses resulting from reliance on the views expressed herein. Investors should exercise caution.

China’s Domestic Large Model API Calls Surpass the United States: Silicon Valley Developers Are “Genuinely Impressed”

Like this:

Related

Related Post

The Asian Industrial Supercycle Has Already Begun… “It’s Not Where You Produce, But Which Ecosystem You’re Connected To”

WeChat Carries the Banner Into All-Out AI War, Xiaowei Goes Live: A Battle for the Entry Point of 1.4 Billion People

On the Eve of NPO’s Breakthrough: The Logic, Orders, and Industrial Chain Restructuring Behind 2026’s Optical Interconnect Mainline

Archive

Category

Tag

China’s Domestic Large Model API Calls Surpass the United States: Silicon Valley Developers Are “Genuinely Impressed”

Share this:

Like this:

Related

Related Post

The Asian Industrial Supercycle Has Already Begun… “It’s Not Where You Produce, But Which Ecosystem You’re Connected To”

WeChat Carries the Banner Into All-Out AI War, Xiaowei Goes Live: A Battle for the Entry Point of 1.4 Billion People

On the Eve of NPO’s Breakthrough: The Logic, Orders, and Industrial Chain Restructuring Behind 2026’s Optical Interconnect Mainline