Model Management
Supported Models
Supported Models
HIVE Protocol integrates with the leading AI model providers, giving you access to the most capable language models available. This guide covers all supported models, their capabilities, pricing, and recommendations for different use cases.
Provider Overview
┌─────────────────────────────────────────────────────────────────┐
│ Supported AI Providers │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ OpenAI │ │ Anthropic │ │ Google AI │ │
│ │ ─────── │ │ ───────── │ │ ───────── │ │
│ │ GPT-4o │ │ Claude 4 │ │ Gemini 1.5 │ │
│ │ GPT-4o Mini│ │ Claude 3.5 │ │ Pro/Flash │ │
│ │ GPT-4 Turbo│ │ Claude 3 │ │ │ │
│ │ GPT-3.5 │ │ Opus/Haiku │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Ollama │ │ Custom │ │
│ │ ─────── │ │ ─────── │ │
│ │ Llama 3 │ │ Any OpenAI │ │
│ │ Mistral │ │ compatible │ │
│ │ Qwen │ │ endpoint │ │
│ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘OpenAI Models
OpenAI provides the GPT family of models, known for their versatility and strong performance across diverse tasks.
GPT-4o (Recommended)
The flagship OpenAI model with multimodal capabilities including vision and audio.
| Property | Value |
|---|---|
| Model ID | gpt-4o |
| Context Window | 128,000 tokens |
| Max Output | 16,384 tokens |
| Input Cost | $2.50 / 1M tokens |
| Output Cost | $10.00 / 1M tokens |
| Knowledge Cutoff | October 2023 |
| Multimodal | Yes (text, vision, audio) |
Strengths:
- Excellent reasoning and analysis
- Strong code generation
- Vision capabilities for image analysis
- Fast response times for its capability level
Best For:
- Complex multi-step reasoning
- Code generation and review
- Document analysis with images
- Production applications requiring high quality
Example Use Cases:
Agent: Technical Architect
Model: gpt-4o
Tasks:
- System design discussions
- Code review with context
- Technical documentation
- Architecture diagrams analysisGPT-4o Mini
A smaller, faster, and more affordable version of GPT-4o.
| Property | Value |
|---|---|
| Model ID | gpt-4o-mini |
| Context Window | 128,000 tokens |
| Max Output | 16,384 tokens |
| Input Cost | $0.15 / 1M tokens |
| Output Cost | $0.60 / 1M tokens |
| Knowledge Cutoff | October 2023 |
| Multimodal | Yes (text, vision) |
Strengths:
- 93% cost reduction vs GPT-4o
- Very fast responses
- Still maintains strong reasoning
- Large context window
Best For:
- High-volume tasks
- Quick responses where top-tier quality isn't critical
- Cost-sensitive applications
- Preprocessing and filtering
Example Use Cases:
Agent: Quick Responder
Model: gpt-4o-mini
Tasks:
- Initial customer queries
- Data extraction
- Simple transformations
- Classification tasksGPT-4 Turbo
High-performance model with vision capabilities.
| Property | Value |
|---|---|
| Model ID | gpt-4-turbo |
| Context Window | 128,000 tokens |
| Max Output | 4,096 tokens |
| Input Cost | $10.00 / 1M tokens |
| Output Cost | $30.00 / 1M tokens |
| Knowledge Cutoff | December 2023 |
| Multimodal | Yes (text, vision) |
Note: GPT-4o is generally recommended over GPT-4 Turbo for most use cases due to better price-performance ratio.
GPT-3.5 Turbo
Fast and economical for simpler tasks.
| Property | Value |
|---|---|
| Model ID | gpt-3.5-turbo |
| Context Window | 16,385 tokens |
| Max Output | 4,096 tokens |
| Input Cost | $0.50 / 1M tokens |
| Output Cost | $1.50 / 1M tokens |
| Knowledge Cutoff | September 2021 |
| Multimodal | No |
Best For:
- Simple queries and chat
- Legacy applications
- Very high volume, cost-sensitive tasks
Anthropic Models
Anthropic's Claude models are known for their strong reasoning, safety features, and excellent writing quality.
Claude Sonnet 4 (claude-sonnet-4-20250514)
The latest and most capable Sonnet model with exceptional reasoning.
| Property | Value |
|---|---|
| Model ID | claude-sonnet-4-20250514 |
| Context Window | 200,000 tokens |
| Max Output | 8,192 tokens |
| Input Cost | $3.00 / 1M tokens |
| Output Cost | $15.00 / 1M tokens |
| Knowledge Cutoff | April 2024 |
| Multimodal | Yes (text, vision) |
Strengths:
- Exceptional reasoning and analysis
- Excellent writing quality
- Strong at following complex instructions
- Best-in-class for nuanced tasks
Best For:
- Complex analysis and research
- High-quality content creation
- Tasks requiring careful reasoning
- Multi-step problem solving
Example Use Cases:
Agent: Research Analyst
Model: claude-sonnet-4-20250514
Tasks:
- Market research synthesis
- Competitive analysis
- Strategic recommendations
- Report writingClaude 3.5 Sonnet
Balanced performance with strong coding capabilities.
| Property | Value |
|---|---|
| Model ID | claude-3-5-sonnet-20241022 |
| Context Window | 200,000 tokens |
| Max Output | 8,192 tokens |
| Input Cost | $3.00 / 1M tokens |
| Output Cost | $15.00 / 1M tokens |
| Knowledge Cutoff | April 2024 |
| Multimodal | Yes (text, vision) |
Strengths:
- Excellent at coding tasks
- Strong general-purpose capabilities
- Good balance of speed and quality
- Reliable instruction following
Best For:
- Code generation and review
- General-purpose tasks
- Technical documentation
- Data analysis
Claude 3 Opus
Highest capability model in the Claude 3 family.
| Property | Value |
|---|---|
| Model ID | claude-3-opus-20240229 |
| Context Window | 200,000 tokens |
| Max Output | 4,096 tokens |
| Input Cost | $15.00 / 1M tokens |
| Output Cost | $75.00 / 1M tokens |
| Knowledge Cutoff | August 2023 |
| Multimodal | Yes (text, vision) |
Strengths:
- Highest quality outputs
- Exceptional at complex reasoning
- Best for creative tasks
- Strong at nuanced understanding
Best For:
- Tasks requiring absolute best quality
- Complex creative projects
- High-stakes content
- Research and analysis
Note: Consider Claude Sonnet 4 as a more cost-effective alternative with comparable quality.
Claude 3 Haiku
Fast and efficient for simple tasks.
| Property | Value |
|---|---|
| Model ID | claude-3-haiku-20240307 |
| Context Window | 200,000 tokens |
| Max Output | 4,096 tokens |
| Input Cost | $0.25 / 1M tokens |
| Output Cost | $1.25 / 1M tokens |
| Knowledge Cutoff | August 2023 |
| Multimodal | Yes (text, vision) |
Strengths:
- Very fast responses
- Low cost
- Large context window
- Still capable for many tasks
Best For:
- Quick responses
- High-volume processing
- Classification tasks
- Simple queries
Example Use Cases:
Agent: Triage Bot
Model: claude-3-haiku-20240307
Tasks:
- Initial message classification
- Quick fact lookup
- Simple data extraction
- Routing decisionsGoogle AI Models
Google's Gemini models offer massive context windows, making them ideal for processing long documents.
Gemini 1.5 Pro
Advanced model with the largest context window available.
| Property | Value |
|---|---|
| Model ID | gemini-1.5-pro |
| Context Window | 1,000,000 tokens |
| Max Output | 8,192 tokens |
| Input Cost | $1.25 / 1M tokens |
| Output Cost | $5.00 / 1M tokens |
| Multimodal | Yes (text, vision, audio, video) |
Strengths:
- Massive 1M token context
- Can process entire codebases
- Video and audio analysis
- Strong long-form reasoning
Best For:
- Processing entire books or codebases
- Long document analysis
- Video/audio content analysis
- Tasks requiring extensive context
Example Use Cases:
Agent: Document Analyzer
Model: gemini-1.5-pro
Tasks:
- Analyze entire repository
- Process 500-page documents
- Video transcript analysis
- Cross-document synthesisGemini 1.5 Flash
Fast and efficient Gemini model.
| Property | Value |
|---|---|
| Model ID | gemini-1.5-flash |
| Context Window | 1,000,000 tokens |
| Max Output | 8,192 tokens |
| Input Cost | $0.075 / 1M tokens |
| Output Cost | $0.30 / 1M tokens |
| Multimodal | Yes (text, vision, audio, video) |
Strengths:
- Extremely cost-effective
- Large context window
- Fast responses
- Multimodal capabilities
Best For:
- Cost-sensitive high-volume tasks
- Long document processing on a budget
- Quick multimodal analysis
- Background processing tasks
Comprehensive Cost Comparison
Cost Per 1M Tokens (USD)
| Model | Input | Output | Blended* |
|---|---|---|---|
| Gemini 1.5 Flash | $0.075 | $0.30 | $0.19 |
| GPT-4o Mini | $0.15 | $0.60 | $0.38 |
| Claude 3 Haiku | $0.25 | $1.25 | $0.75 |
| GPT-3.5 Turbo | $0.50 | $1.50 | $1.00 |
| Gemini 1.5 Pro | $1.25 | $5.00 | $3.13 |
| GPT-4o | $2.50 | $10.00 | $6.25 |
| Claude Sonnet 4 | $3.00 | $15.00 | $9.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 | $9.00 |
| GPT-4 Turbo | $10.00 | $30.00 | $20.00 |
| Claude 3 Opus | $15.00 | $75.00 | $45.00 |
*Blended assumes 50/50 input/output ratio
Monthly Cost Estimates
Based on typical usage patterns:
| Usage Level | Gemini Flash | GPT-4o Mini | Claude Haiku | GPT-4o | Claude Sonnet |
|---|---|---|---|---|---|
| Light (1M tokens) | $0.19 | $0.38 | $0.75 | $6.25 | $9.00 |
| Medium (10M tokens) | $1.88 | $3.75 | $7.50 | $62.50 | $90.00 |
| Heavy (100M tokens) | $18.75 | $37.50 | $75.00 | $625 | $900 |
| Enterprise (1B tokens) | $188 | $375 | $750 | $6,250 | $9,000 |
Model Comparison Matrix
By Capability
| Capability | Best Models | Notes |
|---|---|---|
| Complex Reasoning | Claude Sonnet 4, GPT-4o | Similar performance |
| Code Generation | Claude 3.5 Sonnet, GPT-4o | Claude slightly better for complex code |
| Creative Writing | Claude Sonnet 4, GPT-4o | Claude often preferred |
| Long Context | Gemini 1.5 Pro | 8x more context than alternatives |
| Speed | Gemini Flash, Claude Haiku | Sub-second responses |
| Cost Efficiency | Gemini Flash, GPT-4o Mini | Best value per token |
| Vision | GPT-4o, Claude Sonnet 4 | Both excellent |
| Instruction Following | Claude Sonnet 4, GPT-4o | Claude slightly more precise |
By Use Case
| Use Case | Primary Choice | Budget Alternative |
|---|---|---|
| Customer Support | GPT-4o Mini | Claude Haiku |
| Code Review | Claude 3.5 Sonnet | GPT-4o |
| Content Creation | Claude Sonnet 4 | GPT-4o |
| Data Analysis | GPT-4o | Gemini 1.5 Pro |
| Document Processing | Gemini 1.5 Pro | Gemini 1.5 Flash |
| Research | Claude Sonnet 4 | Claude 3.5 Sonnet |
| Quick Q&A | Claude Haiku | GPT-4o Mini |
| Image Analysis | GPT-4o | Claude Sonnet 4 |
Model Selection Guide
Decision Flowchart
START
│
▼
┌───────────────────────┐
│ Processing long │
│ documents (>100K │──Yes──▶ Gemini 1.5 Pro
│ tokens)? │
└───────────────────────┘
│No
▼
┌───────────────────────┐
│ Cost is primary │
│ concern? │──Yes──▶ Gemini Flash / GPT-4o Mini
└───────────────────────┘
│No
▼
┌───────────────────────┐
│ Need fastest │
│ possible response? │──Yes──▶ Claude Haiku / Gemini Flash
└───────────────────────┘
│No
▼
┌───────────────────────┐
│ Complex reasoning │
│ or writing? │──Yes──▶ Claude Sonnet 4 / GPT-4o
└───────────────────────┘
│No
▼
┌───────────────────────┐
│ Code generation │
│ or review? │──Yes──▶ Claude 3.5 Sonnet / GPT-4o
└───────────────────────┘
│No
▼
GPT-4o Mini
(Good general default)Recommended Model by Agent Role
| Agent Role | Recommended Model | Reasoning |
|---|---|---|
| Research Analyst | Claude Sonnet 4 | Best for synthesis and analysis |
| Code Developer | Claude 3.5 Sonnet | Excellent code generation |
| Content Writer | Claude Sonnet 4 | Superior writing quality |
| Data Processor | GPT-4o Mini | Cost-effective for volume |
| Customer Support | GPT-4o Mini | Fast, accurate responses |
| Creative Director | Claude Sonnet 4 | Best creative output |
| Technical Lead | GPT-4o | Strong reasoning + vision |
| Document Analyst | Gemini 1.5 Pro | Handles long documents |
| Quick Responder | Claude Haiku | Fastest response times |
| Quality Reviewer | Claude Sonnet 4 | Catches nuanced issues |
Performance Benchmarks
Response Time (Average)
| Model | Time to First Token | Full Response (500 tokens) |
|---|---|---|
| Claude Haiku | ~150ms | ~1.2s |
| Gemini Flash | ~200ms | ~1.5s |
| GPT-4o Mini | ~250ms | ~2.0s |
| GPT-4o | ~300ms | ~2.5s |
| Claude Sonnet 4 | ~350ms | ~3.0s |
| Gemini 1.5 Pro | ~400ms | ~3.5s |
| Claude 3 Opus | ~500ms | ~5.0s |
Tokens Per Second (Output)
| Model | Tokens/Second |
|---|---|
| Claude Haiku | ~400 |
| Gemini Flash | ~350 |
| GPT-4o Mini | ~250 |
| GPT-4o | ~200 |
| Claude Sonnet 4 | ~170 |
| Gemini 1.5 Pro | ~150 |
| Claude 3 Opus | ~100 |
Best Practices
1. Use the Right Model for the Job
Don't use expensive models for simple tasks:
Swarm: Content Pipeline
Agents:
- name: Classifier
model: claude-3-haiku-20240307 # Simple classification
- name: Writer
model: claude-sonnet-4-20250514 # Quality matters here
- name: Formatter
model: gpt-4o-mini # Simple formatting2. Consider Context Requirements
Match context window to your needs:
Task: Analyze 10-page document
└─ Any model works fine
Task: Analyze 500-page document
└─ Use Gemini 1.5 Pro (only option)
Task: Analyze codebase with 50+ files
└─ Use Gemini 1.5 Pro for full context
└─ Or use Claude/GPT with chunking3. Optimize for Cost
Use tiered approach based on task complexity:
Tier 1 (Simple): Claude Haiku / Gemini Flash
- Classification
- Routing
- Simple extraction
Tier 2 (Standard): GPT-4o Mini
- General queries
- Data transformation
- Standard responses
Tier 3 (Complex): GPT-4o / Claude Sonnet 4
- Complex reasoning
- High-quality output
- Multi-step tasksRelated Documentation
- [API Keys](/docs/models/api-keys): Configure API keys for each provider
- [Local Models](/docs/models/local-models): Set up Ollama and local models
- [Model Configuration](/docs/models/model-configuration): Fine-tune model parameters
- [Usage & Costs](/docs/models/usage-tracking): Monitor your usage and spending