AI 模型目錄
357 個模型 — GPT-4o, Claude, Gemini, DeepSeek, Llama
| Modality | ||||||
|---|---|---|---|---|---|---|
Tongyi-MAI/Z-Image-Turbo | text→image | — | $0.0055/張 | — | — | |
thenlper/gte-base | text→text | $0.01 | 免費 | — | — | |
intfloat/e5-base-v2 | text→text | $0.01 | 免費 | — | — | |
sentence-transformers/all-minilm-l6-v2 | text→text | $0.01 | 免費 | — | — | |
sentence-transformers/paraphrase-minilm-l6-v2 | text→text | $0.01 | 免費 | — | — | |
sentence-transformers/all-minilm-l12-v2 | text→text | $0.01 | 免費 | — | — | |
sentence-transformers/multi-qa-mpnet-base-dot-v1 | text→text | $0.01 | 免費 | — | — | |
baai/bge-base-en-v1.5 | text→text | $0.01 | 免費 | — | — | |
sentence-transformers/all-mpnet-base-v2 | text→text | $0.01 | 免費 | — | — | |
thenlper/gte-large | text→text | $0.01 | 免費 | — | — | |
intfloat/e5-large-v2 | text→text | $0.01 | 免費 | — | — | |
intfloat/multilingual-e5-large | text→text | $0.01 | 免費 | — | — | |
baai/bge-large-en-v1.5 | text→text | $0.01 | 免費 | — | — | |
baai/bge-m3 | text→text | $0.01 | 免費 | — | — | |
qwen/qwen3-embedding-8b | text→text | $0.01 | 免費 | — | — | |
liquid/lfm-2.2-6b | text→text | $0.01 | $0.02 | — | — | |
liquid/lfm2-8b-a1b | text→text | $0.01 | $0.02 | — | — | |
ibm-granite/granite-4.0-h-micro | text→text | $0.02 | $0.11 | — | — | |
openai/text-embedding-3-small text-embedding-3-small is OpenAI's improved, more performant version of the ada embedding model. Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks. | text→embeddings | $0.02 | 免費 | 8K | Oct 2025 | |
qwen/qwen3-embedding-4b | text→text | $0.02 | 免費 | — | — | |
meta-llama/llama-3.1-8b-instruct | text→text | $0.02 | $0.05 | — | — | |
meta-llama/llama-3.2-3b-instruct | text→text | $0.02 | $0.02 | — | — | |
meta-llama/llama-guard-3-8b | text→text | $0.02 | $0.06 | — | — | |
mistralai/mistral-nemo | text→text | $0.02 | $0.04 | — | — | |
google/gemma-3n-e4b-it Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements.
This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. [Read more in the blog post](https://developers.googleblog.com/en/introducing-gemma-3n/) | text→text | $0.02 | $0.04 | 33K | May 2025 | |
meta-llama/llama-3.2-1b-instruct | text→text | $0.03 | $0.20 | — | — | |
perplexity/pplx-embed-v1-4b pplx-embed-v1 -4B is one of Perplexity's state-of-the-art text embedding models built for real-world, web-scale retrieval. pplx-embed-v1 is optimized for standard dense text retrieval with the 4B parameter model maximizing retrieval quality. | text→embeddings | $0.03 | 免費 | 32K | Mar 2026 | |
google/gemma-2-9b-it | text→text | $0.03 | $0.09 | — | — | |
meta-llama/llama-3-8b-instruct | text→text | $0.03 | $0.04 | — | — | |
openai/gpt-oss-20b | text→text | $0.03 | $0.14 | — | — | |
qwen/qwen2.5-coder-7b-instruct | text→text | $0.03 | $0.09 | — | — | |
liquid/lfm-2-24b-a2b LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per token, it delivers high-quality generation while maintaining low inference costs. The model fits within 32 GB of RAM, making it practical to run on consumer laptops and desktops without sacrificing capability. | text→text | $0.03 | $0.12 | 33K | Feb 2026 | |
amazon/nova-micro-v1 | text→text | $0.04 | $0.14 | — | — | |
cohere/command-r7b-12-2024 | text→text | $0.04 | $0.15 | — | — | |
openai/gpt-oss-120b:exacto | text→text | $0.04 | $0.19 | — | — | |
openai/gpt-oss-120b | text→text | $0.04 | $0.19 | — | — | |
google/gemma-3-12b-it | text→text | $0.04 | $0.13 | — | — | |
google/gemma-3-27b-it | text→text | $0.04 | $0.15 | — | — | |
google/gemma-3-4b-it | text→text | $0.04 | $0.08 | — | — | |
nvidia/nemotron-nano-9b-v2 | text→text | $0.04 | $0.16 | — | — | |
qwen/qwen-2.5-7b-instruct | text→text | $0.04 | $0.10 | — | — | |
sao10k/l3-lunaris-8b | text→text | $0.04 | $0.05 | — | — | |
arcee-ai/trinity-mini | text→text | $0.04 | $0.15 | — | — | |
meta-llama/llama-3.2-11b-vision-instruct | text→text | $0.05 | $0.05 | — | — | |
mistralai/mistral-small-24b-instruct-2501 | text→text | $0.05 | $0.08 | — | — | |
nvidia/nemotron-3-nano-30b-a3b | text→text | $0.05 | $0.20 | — | — | |
qwen/qwen-turbo | text→text | $0.05 | $0.20 | — | — | |
qwen/qwen3-8b | text→text | $0.05 | $0.40 | — | — | |
openai/gpt-5-nano GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger counterparts, it retains key instruction-following and safety features. It is the successor to GPT-4.1-nano and offers a lightweight option for cost-sensitive or real-time applications. | textimagefile→text | $0.05 | $0.40 | 400K | Aug 2025 | |
qwen/qwen3-30b-a3b-thinking-2507 | text→text | $0.05 | $0.34 | — | — | |
mistralai/mistral-small-3.2-24b-instruct | text→text | $0.06 | $0.18 | — | — | |
amazon/nova-lite-v1 | text→text | $0.06 | $0.24 | — | — | |
gryphe/mythomax-l2-13b | text→text | $0.06 | $0.06 | — | — | |
qwen/qwen3-14b | text→text | $0.06 | $0.24 | — | — | |
z-ai/glm-4.7-flash | text→text | $0.06 | $0.40 | — | — | |
microsoft/phi-4 [Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed.
At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs.
For more information, please see [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)
| text→text | $0.06 | $0.14 | 16K | Jan 2025 | |
qwen/qwen3-coder-30b-a3b-instruct | text→text | $0.07 | $0.27 | — | — | |
baidu/ernie-4.5-21b-a3b | text→text | $0.07 | $0.28 | — | — | |
baidu/ernie-4.5-21b-a3b-thinking | text→text | $0.07 | $0.28 | — | — | |
nvidia/nemotron-nano-12b-v2-vl | text→text | $0.07 | $0.20 | — | — | |
qwen/qwen3-235b-a22b-2507 | text→text | $0.07 | $0.10 | — | — | |
google/gemini-2.0-flash-lite-001 | text→text | $0.07 | $0.30 | — | — | |
bytedance-seed/seed-1.6-flash | text→text | $0.07 | $0.30 | — | — | |
openai/gpt-oss-safeguard-20b | text→text | $0.07 | $0.30 | — | — | |
meta-llama/llama-4-scout | text→text | $0.08 | $0.30 | — | — | |
qwen/qwen3-30b-a3b | text→text | $0.08 | $0.28 | — | — | |
qwen/qwen3-32b | text→text | $0.08 | $0.24 | — | — | |
qwen/qwen3-vl-8b-instruct | text→text | $0.08 | $0.50 | — | — | |
alibaba/tongyi-deepresearch-30b-a3b | text→text | $0.09 | $0.45 | — | — | |
neversleep/llama-3.1-lumimaid-8b | text→text | $0.09 | $0.60 | — | — | |
qwen/qwen3-30b-a3b-instruct-2507 | text→text | $0.09 | $0.30 | — | — | |
qwen/qwen3-next-80b-a3b-instruct | text→text | $0.09 | $1.10 | — | — | |
xiaomi/mimo-v2-flash | text→text | $0.09 | $0.29 | — | — | |
allenai/olmo-3-7b-instruct | text→text | $0.10 | $0.20 | — | — | |
bytedance/ui-tars-1.5-7b | text→text | $0.10 | $0.20 | — | — | |
openai/text-embedding-ada-002 text-embedding-ada-002 is OpenAI's legacy text embedding model. | text→embeddings | $0.10 | 免費 | 8K | Oct 2025 | |
qwen/qwen3.5-flash-02-23 The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance. | textimagevideo→text | $0.10 | $0.40 | 1M | Feb 2026 | |
mistralai/voxtral-small-24b-2507 | text→text | $0.10 | $0.30 | — | — | |
nvidia/nemotron-3-super-120b-a12b NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models.
The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified.
Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud. | text→text | $0.10 | $0.50 | 262K | Mar 2026 | |
mistralai/mistral-small-creative | text→text | $0.10 | $0.30 | — | — | |
nvidia/llama-3.3-nemotron-super-49b-v1.5 | text→text | $0.10 | $0.40 | — | — | |
mistralai/mistral-embed-2312 | text→text | $0.10 | 免費 | — | — | |
stepfun/step-3.5-flash | text→text | $0.10 | $0.30 | — | — | |
z-ai/glm-4-32b | text→text | $0.10 | $0.10 | — | — | |
google/gemini-2.0-flash-001 | text→text | $0.10 | $0.40 | — | — | |
google/gemini-2.5-flash-lite-preview-09-2025 | text→text | $0.10 | $0.40 | — | — | |
google/gemini-2.5-flash-lite | text→text | $0.10 | $0.40 | — | — | |
meta-llama/llama-3.3-70b-instruct | text→text | $0.10 | $0.32 | — | — | |
mistralai/ministral-3b-2512 | text→text | $0.10 | $0.10 | — | — | |
reka/reka-edge Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use. | imagetextvideo→text | $0.10 | $0.10 | 16K | Mar 2026 | |
openai/gpt-4.1-nano For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion. | imagetextfile→text | $0.10 | $0.40 | 1M | Apr 2025 | |
bytedance-seed/seed-2.0-mini Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority. | textimagevideo→text | $0.10 | $0.40 | 262K | Feb 2026 | |
rekaai/reka-flash-3 Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 32K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags ("<reasoning>") to indicate its internal thought process.
Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license. | text→text | $0.10 | $0.20 | 66K | Mar 2025 | |
qwen/qwen3.5-9b Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context. | textimagevideo→text | $0.10 | $0.15 | 262K | Mar 2026 | |
rekaai/reka-edge Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use. | imagetextvideo→text | $0.10 | $0.10 | 16K | — | |
mistralai/devstral-small Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and released under the Apache 2.0 license, it features a 128k token context window and supports both Mistral-style function calling and XML output formats.
Designed for agentic coding workflows, Devstral Small 1.1 is optimized for tasks such as codebase exploration, multi-file edits, and integration into autonomous development agents like OpenHands and Cline. It achieves 53.6% on SWE-Bench Verified, surpassing all other open models on this benchmark, while remaining lightweight enough to run on a single 4090 GPU or Apple silicon machine. The model uses a Tekken tokenizer with a 131k vocabulary and is deployable via vLLM, Transformers, Ollama, LM Studio, and other OpenAI-compatible runtimes.
| text→text | $0.10 | $0.30 | 131K | Jul 2025 | |
qwen/qwen3-vl-32b-instruct | text→text | $0.10 | $0.42 | — | — | |
mistralai/mistral-7b-instruct-v0.1 | text→text | $0.11 | $0.19 | — | — | |
qwen/qwen3-vl-8b-thinking | text→text | $0.12 | $1.36 | — | — | |
allenai/olmo-3-7b-think | text→text | $0.12 | $0.20 | — | — | |
qwen/qwen-2.5-72b-instruct | text→text | $0.12 | $0.39 | — | — | |
qwen/qwen3-coder-next | text→text | $0.12 | $0.75 | — | — | |
openai/text-embedding-3-large text-embedding-3-large is OpenAI's most capable embedding model for both english and non-english tasks. Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks. | text→embeddings | $0.13 | 免費 | 8K | Oct 2025 | |
qwen/qwen3-vl-30b-a3b-instruct | text→text | $0.13 | $0.52 | — | — | |
nousresearch/hermes-4-70b | text→text | $0.13 | $0.40 | — | — | |
qwen/qwen3-vl-30b-a3b-thinking | text→text | $0.13 | $1.56 | — | — | |
z-ai/glm-4.5-air | text→text | $0.13 | $0.85 | — | — | |
google/gemma-4-26b-a4b-it Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0. | imagetextvideo→text | $0.13 | $0.40 | 262K | — | |
baidu/ernie-4.5-vl-28b-a3b | text→text | $0.14 | $0.56 | — | — | |
nousresearch/hermes-2-pro-llama-3-8b | text→text | $0.14 | $0.14 | — | — | |
tencent/hunyuan-a13b-instruct | text→text | $0.14 | $0.57 | — | — | |
google/gemma-4-31b-it Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license. | imagetextvideo→text | $0.14 | $0.40 | 262K | — | |
qwen/qwen3-235b-a22b-thinking-2507 | text→text | $0.15 | $1.50 | — | — | |
allenai/olmo-3.1-32b-think | text→text | $0.15 | $0.50 | — | — | |
upstage/solar-pro-3 Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized for Korean with English and Japanese support. | text→text | $0.15 | $0.60 | 128K | Jan 2026 | |
openai/gpt-4o-mini-2024-07-18 | text→text | $0.15 | $0.60 | — | — | |
openai/gpt-4o-mini | text→text | $0.15 | $0.60 | — | — | |
openai/gpt-4o-mini-search-preview | text→text | $0.15 | $0.60 | — | — | |
google/gemini-embedding-001 | text→text | $0.15 | 免費 | — | — | |
mistralai/codestral-embed-2505 | text→text | $0.15 | 免費 | — | — | |
essentialai/rnj-1-instruct | text→text | $0.15 | $0.15 | — | — | |
allenai/olmo-3-32b-think | text→text | $0.15 | $0.50 | — | — | |
meta-llama/llama-4-maverick | text→text | $0.15 | $0.60 | — | — | |
mistralai/ministral-8b-2512 | text→text | $0.15 | $0.15 | — | — | |
cohere/command-r-08-2024 | text→text | $0.15 | $0.60 | — | — | |
deepseek/deepseek-chat-v3.1 | text→text | $0.15 | $0.75 | — | — | |
qwen/qwen3-next-80b-a3b-thinking | text→text | $0.15 | $1.20 | — | — | |
qwen/qwq-32b | text→text | $0.15 | $0.40 | — | — | |
mistralai/mistral-small-2603 Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from Magistral, multimodal understanding from Pixtral, and agentic coding capabilities from Devstral, enabling one model to handle complex analysis, software development, and visual tasks within the same workflow. | textimage→text | $0.15 | $0.60 | 262K | Mar 2026 | |
qwen/qwen3.5-35b-a3b The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B. | textimagevideo→text | $0.16 | $1.30 | 262K | Feb 2026 | |
thedrummer/rocinante-12b | text→text | $0.17 | $0.43 | — | — | |
meta-llama/llama-guard-4-12b | text→text | $0.18 | $0.18 | — | — | |
deepseek/deepseek-chat-v3-0324 | text→text | $0.19 | $0.87 | — | — | |
qwen/qwen3.5-27b The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B. | textimagevideo→text | $0.20 | $1.56 | 262K | Feb 2026 | |
mistralai/mistral-7b-instruct-v0.2 | text→text | $0.20 | $0.20 | — | — | |
openai/gpt-5.4-nano GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency use cases such as classification, data extraction, ranking, and sub-agent execution.
The model prioritizes responsiveness and efficiency over deep reasoning, making it ideal for pipelines that require fast, reliable outputs at scale. GPT-5.4 nano is well suited for background tasks, real-time systems, and distributed agent architectures where minimizing cost and latency is essential. | fileimagetext→text | $0.20 | $1.25 | 400K | Mar 2026 | |
meituan/longcat-flash-chat | text→text | $0.20 | $0.80 | — | — | |
allenai/molmo-2-8b | text→text | $0.20 | $0.20 | — | — | |
allenai/olmo-3.1-32b-instruct | text→text | $0.20 | $0.60 | — | — | |
meta-llama/llama-guard-2-8b | text→text | $0.20 | $0.20 | — | — | |
minimax/minimax-01 | text→text | $0.20 | $1.10 | — | — | |
mistralai/ministral-14b-2512 | text→text | $0.20 | $0.20 | — | — | |
mistralai/mistral-7b-instruct | text→text | $0.20 | $0.20 | — | — | |
mistralai/mistral-7b-instruct-v0.3 | text→text | $0.20 | $0.20 | — | — | |
mistralai/mistral-saba | text→text | $0.20 | $0.60 | — | — | |
prime-intellect/intellect-3 | text→text | $0.20 | $1.10 | — | — | |
qwen/qwen-2.5-vl-7b-instruct | text→text | $0.20 | $0.20 | — | — | |
qwen/qwen-2.5-coder-32b-instruct | text→text | $0.20 | $0.20 | — | — | |
qwen/qwen2.5-vl-32b-instruct | text→text | $0.20 | $0.60 | — | — | |
qwen/qwen3-vl-235b-a22b-instruct | text→text | $0.20 | $0.88 | — | — | |
x-ai/grok-4-fast | text→text | $0.20 | $0.50 | — | — | |
x-ai/grok-4.1-fast | text→text | $0.20 | $0.50 | — | — | |
x-ai/grok-code-fast-1 | text→text | $0.20 | $1.50 | — | — | |
kwaipilot/kat-coder-pro | text→text | $0.21 | $0.83 | — | — | |
deepseek/deepseek-v3.1-terminus:exacto | text→text | $0.21 | $0.79 | — | — | |
deepseek/deepseek-v3.1-terminus | text→text | $0.21 | $0.79 | — | — | |
qwen/qwen-vl-plus | text→text | $0.21 | $0.63 | — | — | |
qwen/qwen3-coder | text→text | $0.22 | $1.00 | — | — | |
qwen/qwen3-coder:exacto | text→text | $0.22 | $1.80 | — | — | |
arcee-ai/trinity-large-thinking Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, agentic workloads, and reasoning tasks. It is free in open claw for the first five days. Launch video: https://youtu.be/Gc82AXLa0Rg?si=4RLn6WBz33qT--B7 | text→text | $0.22 | $0.85 | 262K | — | |
google/gemini-3.1-flash-lite-preview Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash. | textimagevideofileaudio→text | $0.25 | $1.50 | 1M | Mar 2026 | |
openai/gpt-5.1-codex-mini GPT-5.1-Codex-Mini is a smaller and faster version of GPT-5.1-Codex | imagetext→text | $0.25 | $2.00 | 400K | Nov 2025 | |
openai/gpt-5-mini GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost. GPT-5 Mini is the successor to OpenAI's o4-mini model. | textimagefile→text | $0.25 | $2.00 | 400K | Aug 2025 | |
inception/mercury | text→text | $0.25 | $1.00 | — | — | |
inception/mercury-coder | text→text | $0.25 | $1.00 | — | — | |
bytedance-seed/seed-1.6 | text→text | $0.25 | $2.00 | — | — | |
anthropic/claude-3-haiku | text→text | $0.25 | $1.25 | — | — | |
inception/mercury-2 Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM).
Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost.
Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the [blog post](https://www.inceptionlabs.ai/blog/introducing-mercury-2). | text→text | $0.25 | $0.75 | 128K | Mar 2026 | |
bytedance-seed/seed-2.0-lite Seed-2.0-Lite is a balanced model designed for high-frequency enterprise workloads, optimizing for both capability and cost. Its overall performance surpasses the previous-generation Seed-1.8. It is well-suited for production tasks such as unstructured information processing, text content creation, search and recommendation, and data analysis. The model supports long-context processing, multi-source information fusion, multi-step instruction execution, and high-fidelity structured outputs—delivering stable quality while significantly reducing cost. | textimagevideo→text | $0.25 | $2.00 | 262K | Mar 2026 | |
minimax/minimax-m2 | text→text | $0.26 | $1.00 | — | — | |
qwen/qwen3.5-122b-a10b The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B. | textimagevideo→text | $0.26 | $2.08 | 262K | Feb 2026 | |
qwen/qwen3-vl-235b-a22b-thinking | text→text | $0.26 | $2.60 | — | — | |
deepseek/deepseek-v3.2 | text→text | $0.26 | $0.38 | — | — | |
deepseek/deepseek-v3.2-exp | text→text | $0.27 | $0.41 | — | — | |
minimax/minimax-m2.1 | text→text | $0.27 | $0.95 | — | — | |
nex-agi/deepseek-v3.1-nex-n1 | text→text | $0.27 | $1.00 | — | — | |
baidu/ernie-4.5-300b-a47b | text→text | $0.28 | $1.10 | — | — | |
deepseek/deepseek-r1-distill-qwen-32b | text→text | $0.29 | $0.29 | — | — | |
minimax/minimax-m2.7 MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent collaboration, enabling it to plan, execute, and refine complex tasks across dynamic environments.
Trained for production-grade performance, M2.7 handles workflows such as live debugging, root cause analysis, financial modeling, and full document generation across Word, Excel, and PowerPoint. It delivers strong results on benchmarks including 56.2% on SWE-Pro and 57.0% on Terminal Bench 2, while achieving a 1495 ELO on GDPval-AA, setting a new standard for multi-agent systems operating in real-world digital workflows. | text→text | $0.30 | $1.20 | 205K | Mar 2026 | |
minimax/minimax-m2.5 | text→text | $0.30 | $1.10 | — | — | |
thedrummer/cydonia-24b-v4.1 | text→text | $0.30 | $0.50 | — | — | |
x-ai/grok-3-mini-beta | text→text | $0.30 | $0.50 | — | — | |
x-ai/grok-3-mini | text→text | $0.30 | $0.50 | — | — | |
kwaipilot/kat-coder-pro-v2 KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions, with a focus on large-scale production environments, multi-system coordination, and seamless integration across modern software stacks, while also supporting web aesthetics generation to produce production-grade landing pages and presentation decks. | text→text | $0.30 | $1.20 | 256K | Mar 2026 | |
google/gemini-2.5-flash | text→text | $0.30 | $2.50 | — | — | |
minimax/minimax-m2-her | text→text | $0.30 | $1.20 | — | — | |
mistralai/codestral-2508 | text→text | $0.30 | $0.90 | — | — | |
amazon/nova-2-lite-v1 | text→text | $0.30 | $2.50 | — | — | |
nousresearch/hermes-3-llama-3.1-70b | text→text | $0.30 | $0.30 | — | — | |
z-ai/glm-4.6v | text→text | $0.30 | $0.90 | — | — | |
qwen/qwen3-coder-flash | text→text | $0.30 | $1.50 | — | — | |
deepseek/deepseek-chat | text→text | $0.32 | $0.89 | — | — | |
qwen/qwen3.6-plus Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers... | textimagevideo→text | $0.33 | $1.95 | 1M | — | |
mistralai/mistral-small-3.1-24b-instruct | text→text | $0.35 | $0.56 | — | — | |
z-ai/glm-4.6 | text→text | $0.35 | $1.71 | — | — | |
z-ai/glm-4.7 | text→text | $0.38 | $1.70 | — | — | |
xiaomi/mimo-v2-omni MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window. | textaudioimagevideo→text | $0.40 | $2.00 | 262K | Mar 2026 | |
moonshotai/kimi-k2-0905 | text→text | $0.40 | $2.00 | — | — | |
qwen/qwen-plus-2025-07-28 | text→text | $0.40 | $1.20 | — | — | |
qwen/qwen3.5-plus-02-15 | text→text | $0.40 | $2.40 | — | — | |
deepseek/deepseek-r1-0528 | text→text | $0.40 | $1.75 | — | — | |
deepseek/deepseek-v3.2-speciale | text→text | $0.40 | $1.20 | — | — | |
meta-llama/llama-3.1-70b-instruct | text→text | $0.40 | $0.40 | — | — | |
minimax/minimax-m1 | text→text | $0.40 | $2.20 | — | — | |
mistralai/devstral-2512 | text→text | $0.40 | $2.00 | — | — | |
mistralai/devstral-medium | text→text | $0.40 | $2.00 | — | — | |
mistralai/mistral-medium-3 | text→text | $0.40 | $2.00 | — | — | |
qwen/qwen-plus | text→text | $0.40 | $1.20 | — | — | |
qwen/qwen-plus-2025-07-28:thinking | text→text | $0.40 | $1.20 | — | — | |
thedrummer/unslopnemo-12b | text→text | $0.40 | $0.40 | — | — | |
mistralai/mistral-medium-3.1 Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases.
The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments. | textimage→text | $0.40 | $2.00 | 131K | Aug 2025 | |
openai/gpt-4.1-mini GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints. | imagetextfile→text | $0.40 | $1.60 | 1M | Apr 2025 | |
baidu/ernie-4.5-vl-424b-a47b | text→text | $0.42 | $1.25 | — | — | |
z-ai/glm-4.6:exacto | text→text | $0.44 | $1.76 | — | — | |
moonshotai/kimi-k2.5 | text→text | $0.45 | $2.20 | — | — | |
undi95/remm-slerp-l2-13b | text→text | $0.45 | $0.65 | — | — | |
qwen/qwen3-235b-a22b | text→text | $0.46 | $1.82 | — | — | |
moonshotai/kimi-k2-thinking | text→text | $0.47 | $2.00 | — | — | |
moonshotai/kimi-k2 | text→text | $0.50 | $2.40 | — | — | |
google/gemini-3-flash-preview | text→text | $0.50 | $3.00 | — | — | |
mistralai/mistral-large-2512 | text→text | $0.50 | $1.50 | — | — | |
openai/gpt-3.5-turbo | text→text | $0.50 | $1.50 | — | — | |
meta-llama/llama-3-70b-instruct | text→text | $0.51 | $0.74 | — | — | |
mistralai/mixtral-8x7b-instruct | text→text | $0.54 | $0.54 | — | — | |
qwen/qwen3.5-397b-a17b | text→text | $0.55 | $3.50 | — | — | |
thedrummer/skyfall-36b-v2 | text→text | $0.55 | $0.80 | — | — | |
z-ai/glm-4.5 | text→text | $0.55 | $2.00 | — | — | |
moonshotai/kimi-k2-0905:exacto | text→text | $0.60 | $2.50 | — | — | |
nvidia/llama-3.1-nemotron-ultra-253b-v1 | text→text | $0.60 | $1.80 | — | — | |
writer/palmyra-x5 | text→text | $0.60 | $6.00 | — | — | |
z-ai/glm-4.5v | text→text | $0.60 | $1.80 | — | — | |
microsoft/wizardlm-2-8x22b | text→text | $0.62 | $0.62 | — | — | |
google/gemma-2-27b-it | text→text | $0.65 | $0.65 | — | — | |
sao10k/l3.3-euryale-70b | text→text | $0.65 | $0.75 | — | — | |
sao10k/l3.1-euryale-70b | text→text | $0.65 | $0.75 | — | — | |
deepseek/deepseek-r1 | text→text | $0.70 | $2.50 | — | — | |
deepseek/deepseek-r1-distill-llama-70b | text→text | $0.70 | $0.80 | — | — | |
aion-labs/aion-1.0-mini Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B, with benchmark results available on its [Hugging Face page](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview), independently replicated for verification. | text→text | $0.70 | $1.40 | 131K | Feb 2025 | |
openai/gpt-5.4-mini GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding, and tool use, while reducing latency and cost for large-scale deployments.
The model is designed for production environments that require a balance of capability and efficiency, making it well suited for chat applications, coding assistants, and agent workflows that operate at scale. GPT-5.4 mini delivers reliable instruction following, solid multi-step reasoning, and consistent performance across diverse tasks with improved cost efficiency. | fileimagetext→text | $0.75 | $4.50 | 400K | Mar 2026 | |
mancer/weaver | text→text | $0.75 | $1.00 | — | — | |
morph/morph-v3-fast | text→text | $0.80 | $1.20 | — | — | |
qwen/qwen2.5-vl-72b-instruct | text→text | $0.80 | $0.80 | — | — | |
eleutherai/llemma_7b | text→text | $0.80 | $1.20 | — | — | |
aion-labs/aion-rp-llama-3.1-8b Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing. | text→text | $0.80 | $1.60 | 33K | Feb 2025 | |
alfredpros/codellama-7b-instruct-solidity | text→text | $0.80 | $1.20 | — | — | |
amazon/nova-pro-v1 | text→text | $0.80 | $3.20 | — | — | |
anthropic/claude-3.5-haiku | text→text | $0.80 | $4.00 | — | — | |
qwen/qwen-vl-max | text→text | $0.80 | $3.20 | — | — | |
aion-labs/aion-2.0 Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging. It also handles mature and darker themes with more nuance and depth. | text→text | $0.80 | $1.60 | 131K | Feb 2026 | |
switchpoint/router | text→text | $0.85 | $3.40 | — | — | |
morph/morph-v3-large | text→text | $0.90 | $1.90 | — | — | |
z-ai/glm-5 | text→text | $0.95 | $2.55 | — | — | |
z-ai/glm-5-turbo GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows involving long execution chains, with improved complex instruction decomposition, tool use, scheduled and persistent execution, and overall stability across extended tasks. | text→text | $0.96 | $3.20 | 203K | Mar 2026 | |
neversleep/noromaid-20b | text→text | $1.00 | $1.75 | — | — | |
xiaomi/mimo-v2-pro MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like OpenClaw. It ranks among the global top tier in the standard PinchBench and ClawBench benchmarks, with perceived performance approaching that of Opus 4.6. MiMo-V2-Pro is designed to serve as the brain of agent systems, orchestrating complex workflows, driving production engineering tasks, and delivering results reliably. | text→text | $1.00 | $3.00 | 1M | Mar 2026 | |
anthropic/claude-haiku-4.5 | text→text | $1.00 | $5.00 | — | — | |
nousresearch/hermes-3-llama-3.1-405b | text→text | $1.00 | $1.00 | — | — | |
nousresearch/hermes-4-405b | text→text | $1.00 | $3.00 | — | — | |
openai/gpt-3.5-turbo-0613 | text→text | $1.00 | $2.00 | — | — | |
perplexity/sonar | text→text | $1.00 | $1.00 | — | — | |
qwen/qwen3-coder-plus | text→text | $1.00 | $5.00 | — | — | |
relace/relace-search | text→text | $1.00 | $3.00 | — | — | |
openai/o3-mini-high OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high.
o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.
The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost. | textfile→text | $1.10 | $4.40 | 200K | Feb 2025 | |
openai/o3-mini OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding.
This model supports the `reasoning_effort` parameter, which can be set to "high", "medium", or "low" to control the thinking time of the model. The default is "medium". OpenRouter also offers the model slug `openai/o3-mini-high` to default the parameter to "high".
The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.
The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost. | textfile→text | $1.10 | $4.40 | 200K | Jan 2025 | |
openai/o4-mini OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains.
Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute. | imagetextfile→text | $1.10 | $4.40 | 200K | Apr 2025 | |
openai/o4-mini-high OpenAI o4-mini-high is the same model as [o4-mini](/openai/o4-mini) with reasoning_effort set to high.
OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains.
Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute. | imagetextfile→text | $1.10 | $4.40 | 200K | Apr 2025 | |
nvidia/llama-3.1-nemotron-70b-instruct | text→text | $1.20 | $1.20 | — | — | |
qwen/qwen3-max | text→text | $1.20 | $6.00 | — | — | |
qwen/qwen3-max-thinking | text→text | $1.20 | $6.00 | — | — | |
z-ai/glm-5v-turbo GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding, and task execution, and works seamlessly with agents to complete the full loop of “perceive → plan → execute“. | imagetextvideo→text | $1.20 | $4.00 | 203K | — | |
GPT 5.3 Codex-30% openai/gpt-5.3-codex GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, reflecting improved multi-language coding, terminal proficiency, and real-world computer-use skills. The model is optimized for long-running, tool-using workflows and supports interactive steering during execution, making it suitable for complex development tasks, debugging, deployment, and iterative product work.
Beyond coding, GPT-5.3-Codex performs strongly on structured knowledge-work benchmarks such as GDPval, supporting tasks like document drafting, spreadsheet analysis, slide creation, and operational research across domains. It is trained with enhanced cybersecurity awareness, including vulnerability identification capabilities, and deployed with additional safeguards for high-risk use cases. Compared to prior Codex models, it is more token-efficient and approximately 25% faster, targeting professional end-to-end workflows that span reasoning, execution, and computer interaction. | textimage→text | $1.75$1.22 | $14.00$9.80 | 400K | Feb 2026 | |
google/gemini-2.5-pro-preview-05-06 | text→text | $1.25 | $10.00 | — | — | |
openai/gpt-5 GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. | textimagefile→text | $1.25 | $10.00 | 400K | Aug 2025 | |
openai/gpt-5-chat GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications. | fileimagetext→text | $1.25 | $10.00 | 128K | Aug 2025 | |
openai/gpt-5.1 GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems.
Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5 | imagetextfile→text | $1.25 | $10.00 | 400K | Nov 2025 | |
openai/gpt-5.1-chat GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.
| fileimagetext→text | $1.25 | $10.00 | 128K | Nov 2025 | |
openai/gpt-5-codex GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level)
Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications. | textimage→text | $1.25 | $10.00 | 400K | Sep 2025 | |
openai/gpt-5.1-codex GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level)
Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications. | textimage→text | $1.25 | $10.00 | 400K | Nov 2025 | |
google/gemini-2.5-pro | text→text | $1.25 | $10.00 | — | — | |
google/gemini-2.5-pro-preview | text→text | $1.25 | $10.00 | — | — | |
deepcogito/cogito-v2.1-671b | text→text | $1.25 | $1.25 | — | — | |
openai/gpt-5.1-codex-max GPT-5.1-Codex-Max is OpenAI’s latest agentic coding model, designed for long-running, high-context software development tasks. It is based on an updated version of the 5.1 reasoning stack and trained on agentic workflows spanning software engineering, mathematics, and research.
GPT-5.1-Codex-Max delivers faster performance, improved reasoning, and higher token efficiency across the development lifecycle. | textimage→text | $1.25 | $10.00 | 400K | Dec 2025 | |
z-ai/glm-5.1 GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on... | text→text | $1.26 | $3.96 | 203K | — | |
sao10k/l3-euryale-70b | text→text | $1.48 | $1.48 | — | — | |
openai/gpt-3.5-turbo-instruct | text→text | $1.50 | $2.00 | — | — | |
qwen/qwen-max | text→text | $1.60 | $6.40 | — | — | |
openai/gpt-5.2-codex GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1-Codex, 5.2-Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level)
Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications. | textimage→text | $1.75 | $14.00 | 400K | Jan 2026 | |
openai/gpt-5.2-chat GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.2 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation. | fileimagetext→text | $1.75 | $14.00 | 128K | Dec 2025 | |
openai/gpt-5.2 GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks.
Built for broad task coverage, GPT-5.2 delivers consistent gains across math, coding, sciende, and tool calling workloads, with more coherent long-form answers and improved tool-use reliability. | fileimagetext→text | $1.75 | $14.00 | 400K | Dec 2025 | |
openai/gpt-5.3-chat GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly reduces unnecessary refusals, caveats, and overly cautious phrasing that can interrupt conversational flow. | textimagefile→text | $1.75 | $14.00 | 128K | Mar 2026 | |
google/gemini-3.1-pro-preview | text→text | $2.00 | $12.00 | — | — | |
ai21/jamba-large-1.7 Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions. | text→text | $2.00 | $8.00 | 256K | Aug 2025 | |
google/gemini-3.1-pro-preview-customtools Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party or user-defined functions are available. This specialized preview endpoint significantly increases function calling reliability and ensures the model selects the most appropriate tool in coding agents and complex, multi-tool workflows.
It retains the core strengths of Gemini 3.1 Pro, including multimodal reasoning across text, image, video, audio, and code, a 1M-token context window, and strong software engineering performance. | textaudioimagevideofile→text | $2.00 | $12.00 | 1M | Feb 2026 | |
openai/gpt-4.1 GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval. | imagetextfile→text | $2.00 | $8.00 | 1M | Apr 2025 | |
mistralai/mixtral-8x22b-instruct Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include:
- strong math, coding, and reasoning
- large context length (64k)
- fluency in English, French, Italian, German, and Spanish
See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/).
#moe | text→text | $2.00 | $6.00 | 66K | Apr 2024 | |
mistralai/pixtral-large-2411 Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images.
The model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.
| textimage→text | $2.00 | $6.00 | 131K | Nov 2024 | |
perplexity/sonar-deep-research | text→text | $2.00 | $8.00 | — | — | |
google/gemini-3-pro-preview | text→text | $2.00 | $12.00 | — | — | |
mistralai/mistral-large | text→text | $2.00 | $6.00 | — | — | |
mistralai/mistral-large-2407 | text→text | $2.00 | $6.00 | — | — | |
mistralai/mistral-large-2411 | text→text | $2.00 | $6.00 | — | — | |
perplexity/sonar-reasoning-pro | text→text | $2.00 | $8.00 | — | — | |
openai/o3 o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images. | imagetextfile→text | $2.00 | $8.00 | 200K | Apr 2025 | |
openai/o4-mini-deep-research o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for tackling complex, multi-step research tasks.
Note: This model always uses the 'web_search' tool which adds additional cost. | fileimagetext→text | $2.00 | $8.00 | 200K | Oct 2025 | |
x-ai/grok-4.20 Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses.
Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens) | textimage→text | $2.00 | $6.00 | 2M | Mar 2026 | |
x-ai/grok-4.20-multi-agent Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks.
Reasoning effort behavior:
- low / medium: 4 agents
- high / xhigh: 16 agents | textimagefile→text | $2.00 | $6.00 | 2M | Mar 2026 | |
x-ai/grok-4.20-beta Grok 4.20 Beta is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses.
Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens) | textimage→text | $2.00 | $6.00 | 2M | Mar 2026 | |
x-ai/grok-4.20-multi-agent-beta Grok 4.20 Multi-Agent Beta is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks.
Reasoning effort behavior:
- low / medium: 4 agents
- high / xhigh: 16 agents | textimage→text | $2.00 | $6.00 | 2M | Mar 2026 | |
openai/gpt-4o-search-preview | text→text | $2.50 | $10.00 | — | — | |
openai/gpt-4o | text→text | $2.50 | $10.00 | — | — | |
inflection/inflection-3-productivity | text→text | $2.50 | $10.00 | — | — | |
inflection/inflection-3-pi | text→text | $2.50 | $10.00 | — | — | |
amazon/nova-premier-v1 | text→text | $2.50 | $12.50 | — | — | |
cohere/command-a | text→text | $2.50 | $10.00 | — | — | |
cohere/command-r-plus-08-2024 | text→text | $2.50 | $10.00 | — | — | |
openai/gpt-4o-2024-11-20 | text→text | $2.50 | $10.00 | — | — | |
openai/gpt-4o-2024-08-06 | text→text | $2.50 | $10.00 | — | — | |
openai/gpt-5.4 GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling high-context reasoning, coding, and multimodal analysis within the same workflow.
The model delivers improved performance in coding, document understanding, tool use, and instruction following. It is designed as a strong default for both general-purpose tasks and software engineering, capable of generating production-quality code, synthesizing information across multiple sources, and executing complex multi-step workflows with fewer iterations and greater token efficiency. | textimagefile→text | $2.50 | $15.00 | 1M | Mar 2026 | |
anthropic/claude-sonnet-4.6 Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation. | textimage→text | $3.00 | $15.00 | 1M | Feb 2026 | |
anthracite-org/magnum-v4-72b | text→text | $3.00 | $5.00 | — | — | |
anthropic/claude-3.7-sonnet | text→text | $3.00 | $15.00 | — | — | |
anthropic/claude-sonnet-4.5 | text→text | $3.00 | $15.00 | — | — | |
anthropic/claude-sonnet-4 | text→text | $3.00 | $15.00 | — | — | |
openai/gpt-3.5-turbo-16k | text→text | $3.00 | $4.00 | — | — | |
perplexity/sonar-pro-search | text→text | $3.00 | $15.00 | — | — | |
perplexity/sonar-pro | text→text | $3.00 | $15.00 | — | — | |
sao10k/l3.1-70b-hanami-x1 | text→text | $3.00 | $3.00 | — | — | |
x-ai/grok-3 | text→text | $3.00 | $15.00 | — | — | |
x-ai/grok-3-beta | text→text | $3.00 | $15.00 | — | — | |
x-ai/grok-4 | text→text | $3.00 | $15.00 | — | — | |
anthropic/claude-3.7-sonnet:thinking Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes.
Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.
Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet) | textimagefile→text | $3.00 | $15.00 | 200K | Feb 2025 | |
alpindale/goliath-120b | text→text | $3.75 | $7.50 | — | — | |
meta-llama/llama-3.1-405b-instruct | text→text | $4.00 | $4.00 | — | — | |
meta-llama/llama-3.1-405b | text→text | $4.00 | $4.00 | — | — | |
aion-labs/aion-1.0 Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model. | text→text | $4.00 | $8.00 | 131K | Feb 2025 | |
raifle/sorcererlm-8x22b | text→text | $4.50 | $4.50 | — | — | |
anthropic/claude-opus-4.6 | text→text | $5.00 | $25.00 | — | — | |
openai/gpt-4o-2024-05-13 | text→text | $5.00 | $15.00 | — | — | |
anthropic/claude-opus-4.5 | text→text | $5.00 | $25.00 | — | — | |
anthropic/claude-opus-4.7 Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on... | textimage→text | $5.00 | $25.00 | 1M | — | |
anthropic/claude-3.5-sonnet | text→text | $6.00 | $30.00 | — | — | |
openai/gpt-4o:extended | text→text | $6.00 | $18.00 | — | — | |
openai/gpt-4-turbo | text→text | $10.00 | $30.00 | — | — | |
openai/gpt-4-turbo-preview | text→text | $10.00 | $30.00 | — | — | |
openai/gpt-4-1106-preview The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling.
Training data: up to April 2023. | text→text | $10.00 | $30.00 | 128K | Nov 2023 | |
openai/o3-deep-research o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex, multi-step research tasks.
Note: This model always uses the 'web_search' tool which adds additional cost. | imagetextfile→text | $10.00 | $40.00 | 200K | Oct 2025 | |
anthropic/claude-opus-4 | text→text | $15.00 | $75.00 | — | — | |
openai/gpt-5-pro GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. | imagetextfile→text | $15.00 | $120.00 | 400K | Oct 2025 | |
openai/o1 The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.
The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).
| textimagefile→text | $15.00 | $60.00 | 200K | Dec 2024 | |
anthropic/claude-opus-4.1 | text→text | $15.00 | $75.00 | — | — | |
openai/o3-pro The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently better answers.
Note that BYOK is required for this model. Set up here: https://openrouter.ai/settings/integrations | textfileimage→text | $20.00 | $80.00 | 200K | Jun 2025 | |
openai/gpt-5.2-pro GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks. | imagetextfile→text | $21.00 | $168.00 | 400K | Dec 2025 | |
openai/gpt-4-0314 | text→text | $30.00 | $60.00 | — | — | |
openai/gpt-4 | text→text | $30.00 | $60.00 | — | — | |
anthropic/claude-opus-4.6-fast Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities with higher output speed at premium 6x pricing.
Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode | textimage→text | $30.00 | $150.00 | 1M | — | |
openai/gpt-5.4-pro GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs. Optimized for step-by-step reasoning, instruction following, and accuracy, GPT-5.4 Pro excels at agentic coding, long-context workflows, and multi-step problem solving. | textimagefile→text | $30.00 | $180.00 | 1M | Mar 2026 | |
openai/o1-pro The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. | textimagefile→text | $150.00 | $600.00 | 200K | Mar 2025 |