Updated February 2026

AI Model Comparison

Compare the leading AI models side-by-side—capabilities, pricing, benchmarks, and best use cases.

Showing 17 of 17 models

ModelProviderContextInput PriceOutput PriceCapabilitiesMMLUHumanEvalBest ForLinks
GPT-4o
May 2024
OpenAI128K$2.50/1M$10.00/1M
🎤
88.7%90.2%
General useMultimodal tasks
Visit
GPT-4 Turbo
Apr 2024
OpenAI128K$10.00/1M$30.00/1M
86.4%87.1%
Complex reasoningLong documents
Visit
Claude 3.5 Sonnet
Jun 2024
Anthropic200K$3.00/1M$15.00/1M
88.3%92%
CodingAnalysis
Visit
Claude 3.5 Opus
Oct 2024
Anthropic200K$15.00/1M$75.00/1M
91.2%94.5%
ResearchComplex reasoning
Visit
Gemini 1.5 Pro
Feb 2024
Google2M$1.25/1M$5.00/1M
🎤🎬
85.9%84.1%
Ultra-long contextVideo analysis
Visit
Gemini Ultra
Dec 2023
Google128KEnterpriseEnterprise
🎤🎬
90%74.4%
EnterpriseComplex tasks
Visit
Grok-2
Aug 2024
xAI128K$2.00/1M$10.00/1M
87.5%88.4%
Real-time infoX integration
Visit
GPT-4o mini
Jul 2024
OpenAI128K$0.15/1M$0.60/1M
82%87%
Cost-effectiveHigh volume
Visit
Claude 3 Haiku
Mar 2024
Anthropic200K$0.25/1M$1.25/1M
75.2%75.9%
SpeedSimple tasks
Visit
Gemini 1.5 Flash
May 2024
Google1M$0.075/1M$0.30/1M
🎤🎬
78.9%74.3%
Cost-effectiveLong context
Visit
Mistral Large 2
Jul 2024
Mistral128K$2.00/1M$6.00/1M
84%92.1%
CodingEuropean compliance
Visit
Command R+
Apr 2024
Cohere128K$2.50/1M$10.00/1M
75.7%72%
RAGEnterprise
Visit
Llama 3.1 405B
Jul 2024
Meta128KFree / $5.00/1M*Free / $15.00/1M*
88.6%89%
Self-hostingFine-tuning
Visit
Llama 3.1 70B
Jul 2024
Meta128KFree / $0.90/1M*Free / $0.90/1M*
83.6%80.5%
BalancedSelf-hosting
Visit
Mixtral 8x22B
Apr 2024
Mistral64KFree / $1.20/1M*Free / $1.20/1M*
77.8%75%
MoE efficiencyMultilingual
Visit
Qwen2 72B
Jun 2024
Alibaba128KFreeFree
84.2%86%
Chinese/EnglishMath
Visit
DeepSeek V2.5
Sep 2024
DeepSeek128K$0.14/1M$0.28/1M
80.4%89.4%
CodingUltra-cheap
Visit

* Prices for open-source models reflect API hosting costs (e.g., Together AI, Fireworks). Self-hosting is free.

Quick Recommendations

Best Value

GPT-4o mini

Incredible performance at $0.15/1M input tokens. Perfect for high-volume applications.

Best for Coding

Claude 3.5 Sonnet

92% HumanEval score with 200K context. The developer\'s choice for complex coding tasks.

Best for Long Context

Gemini 1.5 Pro

2 million token context window. Analyze entire codebases or books in one prompt.

Weekly newsletter starting March 1st. No spam. Unsubscribe anytime.