Agent Configuration
Agent Settings
Agent Settings
Agent settings provide fine-grained control over how your AI agents generate responses. Understanding and properly configuring these settings is crucial for achieving consistent, high-quality outputs tailored to your specific use cases. This guide covers everything from basic model selection to advanced parameters that most users never touch.
Whether you're optimizing for cost, quality, speed, or a balance of all three, mastering these settings will help you create more effective agents.
Model Selection: Choosing the Right Brain
The model you select determines your agent's fundamental capabilities, cost, and response characteristics. Each provider offers models optimized for different use cases.
Understanding Model Trade-offs
┌─────────────────────────────────────────────────────────────┐
│ MODEL SELECTION MATRIX │
├─────────────────────────────────────────────────────────────┤
│ │
│ QUALITY │
│ ▲ │
│ │ ┌─────────────┐ │
│ │ │ GPT-4o │ ┌─────────────┐ │
│ │ │ Claude Opus │ │ Claude │ │
│ │ └─────────────┘ │ Sonnet │ │
│ │ │ Gemini Pro │ │
│ │ └─────────────┘ │
│ │ ┌─────────────┐ │
│ │ │ GPT-4o-mini │ │
│ │ │ Gemini Flash│ │
│ │ └─────────────┘ │
│ │ ┌─────────────┐ │
│ │ │ GPT-3.5 │ │
│ │ │ Claude Haiku│ │
│ │ └─────────────┘ │
│ │ │
│ └─────────────────────────────────────────────────► │
│ SPEED/COST │
│ │
└─────────────────────────────────────────────────────────────┘OpenAI Models
OpenAI offers the most widely-used models with excellent documentation and consistent behavior.
| Model | Context | Speed | Cost | Strengths | Best Use Cases |
|---|---|---|---|---|---|
| gpt-4o | 128K | Medium | $$$$ | Strongest reasoning, multimodal | Complex analysis, nuanced tasks |
| gpt-4o-mini | 128K | Fast | $$ | Great balance of quality/cost | Most general tasks |
| gpt-4-turbo | 128K | Medium | $$$ | Long context, good for docs | Document processing |
| gpt-3.5-turbo | 16K | Very Fast | $ | Quick, cost-effective | Simple tasks, high volume |
When to Choose OpenAI:
- You need reliable, well-documented behavior
- Code generation is a primary use case
- You want the largest ecosystem of examples and tutorials
- Consistent performance is more important than peak capability
Pricing Insight (approximate):
GPT-4o: $5.00 / 1M input tokens, $15.00 / 1M output tokens
GPT-4o-mini: $0.15 / 1M input tokens, $0.60 / 1M output tokens
GPT-3.5-turbo: $0.50 / 1M input tokens, $1.50 / 1M output tokensAnthropic Models
Anthropic's Claude models excel at nuanced understanding, long-form content, and following complex instructions.
| Model | Context | Speed | Cost | Strengths | Best Use Cases |
|---|---|---|---|---|---|
| claude-sonnet-4-20250514 | 200K | Medium | $$$ | Latest, best balanced | Most tasks |
| claude-3-5-sonnet | 200K | Medium | $$$ | Excellent writing | Content, analysis |
| claude-3-opus | 200K | Slow | $$$$ | Highest quality | Complex reasoning |
| claude-3-haiku | 200K | Very Fast | $ | Quick responses | Simple queries, high volume |
When to Choose Anthropic:
- Long-form writing and content creation
- Nuanced, conversational interactions
- Tasks requiring careful instruction following
- You need large context windows (200K tokens)
Unique Claude Features:
- Larger standard context window than OpenAI
- Excellent at maintaining consistent persona
- Strong at admitting uncertainty
- Good at following complex, multi-part instructions
Google AI Models
Google's Gemini models offer exceptional context windows and multimodal capabilities.
| Model | Context | Speed | Cost | Strengths | Best Use Cases |
|---|---|---|---|---|---|
| gemini-1.5-pro | 1M | Medium | $$$ | Massive context | Large documents, codebases |
| gemini-1.5-flash | 1M | Very Fast | $$ | Speed + context | Quick analysis of large inputs |
When to Choose Google AI:
- Processing very large documents (legal contracts, codebases)
- Analyzing long conversation histories
- Multimodal tasks (text + images)
- Cost-effective processing of large inputs
Context Window Advantage:
OpenAI GPT-4o: 128,000 tokens (~96,000 words)
Anthropic Claude: 200,000 tokens (~150,000 words)
Google Gemini: 1,000,000 tokens (~750,000 words)Model Selection Decision Tree
START
│
├─► Need to process very large documents (100K+ tokens)?
│ └─► YES: Use Gemini 1.5 Pro
│
├─► Primary use case is creative writing or nuanced content?
│ └─► YES: Use Claude Sonnet or Opus
│
├─► Need maximum reasoning capability for complex analysis?
│ └─► YES: Use GPT-4o or Claude Opus
│
├─► High-volume, simple tasks where cost matters most?
│ └─► YES: Use GPT-3.5-turbo or Claude Haiku
│
├─► General-purpose with good quality/cost balance?
│ └─► YES: Use GPT-4o-mini or Claude Sonnet
│
└─► Default recommendation: GPT-4o-mini (best all-around value)Temperature: Controlling Randomness
Temperature is the most important setting for controlling response characteristics. It determines how "creative" or "deterministic" the model's outputs will be.
How Temperature Works
At a technical level, temperature affects the probability distribution when the model selects the next token:
Low Temperature (0.0 - 0.3):
• Model strongly favors highest-probability tokens
• Outputs are predictable and consistent
• Same input → nearly identical output each time
High Temperature (0.8 - 1.5):
• Model considers lower-probability tokens more often
• Outputs are varied and creative
• Same input → different outputs each timeTemperature Visualization
Temperature Scale and Effects:
0.0 ──────────────────────────────────────────────────────► 2.0
│ │ │ │
│ │ │ │
DETERMINISTIC BALANCED CREATIVE CHAOTIC
• Factual queries • General chat • Brainstorming • Experimental
• Code generation • Explanations • Creative writing • May be
• Data extraction • Q&A • Storytelling incoherent
• Math problems • Most tasks • Poetry • Unpredictable
• Ideation
Recommended: Recommended: Recommended: Caution:
0.0 - 0.2 0.3 - 0.5 0.6 - 1.0 1.0+Temperature by Task Type
| Task Type | Recommended Temperature | Why |
|---|---|---|
| Code generation | 0.0 - 0.2 | Code must be syntactically correct |
| Data extraction | 0.0 - 0.1 | Accuracy is critical |
| Technical documentation | 0.2 - 0.3 | Consistency matters, slight variety OK |
| Customer support | 0.3 - 0.4 | Helpful but consistent responses |
| General Q&A | 0.4 - 0.6 | Balance of accuracy and natural flow |
| Content writing | 0.6 - 0.8 | Creative but coherent |
| Brainstorming | 0.8 - 1.0 | Maximum idea variety |
| Poetry/creative | 0.9 - 1.2 | Unique, unexpected outputs |
Temperature Examples
Temperature 0.1 - Factual Query:
Prompt: "What is the capital of France?"
Response: "Paris is the capital of France."
(Response will be nearly identical every time)Temperature 0.7 - General Explanation:
Prompt: "Explain why the sky is blue"
Response 1: "The sky appears blue due to a phenomenon called
Rayleigh scattering. When sunlight enters Earth's atmosphere..."
Response 2: "When sunlight travels through our atmosphere, it
collides with gas molecules. Blue light, having a shorter
wavelength, scatters more than other colors..."
(Responses vary in structure and wording but remain accurate)Temperature 1.0 - Creative Writing:
Prompt: "Write the opening line of a mystery novel"
Response 1: "The letter arrived on a Tuesday, postmarked from
a town that hadn't existed in thirty years."
Response 2: "Detective Mills had seen a lot of strange things
in her career, but nothing prepared her for the empty coffin."
Response 3: "Rain drummed against the window as Sarah realized
the photograph couldn't possibly be real."
(Each response is unique and creative)Token Limits: Managing Response Length
Tokens are the currency of AI models—they determine how much content can be processed and generated.
Understanding Tokens
Token Basics:
• 1 token ≈ 4 characters in English
• 1 token ≈ 0.75 words
• 100 tokens ≈ 75 words
• 1,000 tokens ≈ 750 words
Examples:
"Hello" = 1 token
"Hello, how are you today?" = 6 tokens
"Supercalifragilisticexpialidocious" = 9 tokens (unusual words = more tokens)Max Tokens Setting
The max_tokens parameter limits how long the agent's response can be:
| Setting | Approx. Words | Use Case |
|---|---|---|
| 150 | ~110 words | Tweet-length responses |
| 256 | ~190 words | Brief answers |
| 512 | ~380 words | Short paragraphs |
| 1024 | ~750 words | Standard responses |
| 2048 | ~1,500 words | Detailed explanations |
| 4096 | ~3,000 words | Long-form content |
| 8192 | ~6,000 words | Articles, documentation |
| 16384 | ~12,000 words | Comprehensive reports |
Important: Setting max_tokens too low can cause responses to be cut off mid-sentence. Set it higher than your expected response length.
Context Window Management
The context window includes everything the model "sees":
┌─────────────────────────────────────────────────────────────┐
│ CONTEXT WINDOW BUDGET │
├─────────────────────────────────────────────────────────────┤
│ │
│ System Prompt [████████░░░░░░░░░░░░] ~2,000 tokens │
│ Context Blocks [████░░░░░░░░░░░░░░░░] ~1,000 tokens │
│ Conversation History [████████████████░░░░] ~8,000 tokens │
│ Reserved for Response [████████░░░░░░░░░░░░] ~4,000 tokens │
│ ───────────────────── │
│ TOTAL USED: ~15,000 tokens │
│ │
│ Model: GPT-4o (128K context) │
│ Available for more history: ~113,000 tokens │
│ │
└─────────────────────────────────────────────────────────────┘Best Practices:
- Monitor token usage in long conversations
- Summarize or truncate old messages when approaching limits
- Keep system prompts concise but effective
- Consider model context size when planning workflows
Advanced Parameters
Presence Penalty
Presence penalty reduces the likelihood of the model repeating topics it has already mentioned. It applies equally to all tokens that have appeared at least once.
Presence Penalty Range: 0.0 to 2.0
0.0 → No penalty (default)
Model may return to same topics naturally
0.5 → Mild penalty
Slight encouragement to explore new topics
1.0 → Moderate penalty
Noticeable shift toward new topics
2.0 → Strong penalty
Aggressively avoids returning to mentioned topics
Warning: May make responses feel disconnectedWhen to Increase Presence Penalty:
- Brainstorming sessions where you want diverse ideas
- Content generation needing broad topic coverage
- Conversations that feel stuck on the same points
When to Keep Low (0.0):
- Technical explanations that need to reference key terms
- Customer support requiring consistent terminology
- Code generation
Frequency Penalty
Frequency penalty reduces repetition of specific words proportional to how many times they've appeared.
Frequency Penalty Range: 0.0 to 2.0
0.0 → No penalty (default)
Natural word repetition allowed
0.5 → Mild penalty
Encourages vocabulary variety
1.0 → Moderate penalty
Noticeably more varied word choice
2.0 → Strong penalty
Strongly avoids repeating words
Warning: May produce awkward phrasingFrequency vs. Presence:
Presence Penalty: "Have you mentioned this topic at all?"
Binary: mentioned or not
Frequency Penalty: "How many times have you used this word?"
Proportional: penalty increases with repetitionWhen to Increase Frequency Penalty:
- Creative writing where varied vocabulary matters
- Marketing copy that needs fresh language
- Content that feels repetitive
When to Keep Low (0.0):
- Technical writing with necessary terminology
- Code generation with repeated patterns
- Instructional content with key terms
Top P (Nucleus Sampling)
Top P provides an alternative to temperature for controlling randomness. It limits token selection to a probability mass.
Top P Range: 0.0 to 1.0
1.0 → Consider all tokens (default)
0.9 → Consider tokens in top 90% probability mass
0.5 → Consider only most likely tokens
0.1 → Very restrictive, almost deterministicTemperature vs. Top P:
- Temperature scales the entire probability distribution
- Top P truncates it, removing low-probability options
Recommendation: Use either temperature OR top P, not both. Temperature is more intuitive for most users.
Configuration Profiles
By Use Case
Customer Support Agent
Model: gpt-4o-mini
Temperature: 0.3
Max Tokens: 1024
Presence Penalty: 0.0
Frequency Penalty: 0.0
Rationale:
- Consistent, helpful responses
- Not too creative (could lead to incorrect info)
- Long enough for detailed answers
- Natural language, don't need forced varietyTechnical Documentation Writer
Model: claude-sonnet-4-20250514
Temperature: 0.2
Max Tokens: 4096
Presence Penalty: 0.0
Frequency Penalty: 0.1
Rationale:
- Accurate, consistent technical content
- Long enough for complete documentation sections
- Slight variety to avoid robotic repetition
- Claude excels at structured writingCreative Writing Assistant
Model: claude-sonnet-4-20250514
Temperature: 0.9
Max Tokens: 4096
Presence Penalty: 0.6
Frequency Penalty: 0.4
Rationale:
- High creativity for unique content
- Explores diverse topics and ideas
- Varied vocabulary for engaging prose
- Claude's strength in creative workCode Review Agent
Model: gpt-4o
Temperature: 0.1
Max Tokens: 8192
Presence Penalty: 0.0
Frequency Penalty: 0.0
Rationale:
- Deterministic, consistent analysis
- Long enough for detailed code reviews
- Needs to reference same concepts/code repeatedly
- GPT-4o strong at code analysisResearch Synthesizer
Model: gemini-1.5-pro
Temperature: 0.4
Max Tokens: 16384
Presence Penalty: 0.3
Frequency Penalty: 0.0
Rationale:
- Process large amounts of research
- Balanced creativity for synthesis
- Encourages covering diverse aspects
- Massive context for source materialBrainstorming Partner
Model: gpt-4o
Temperature: 1.0
Max Tokens: 2048
Presence Penalty: 0.8
Frequency Penalty: 0.5
Rationale:
- Maximum idea variety
- Strongly explores new directions
- Fresh language and perspectives
- Not too long—rapid ideationSettings in the Dashboard
Accessing Agent Settings
- Navigate to Agents in the sidebar
- Click on the agent you want to configure
- Select the Settings tab
- Modify parameters as needed
- Click Save Changes
Settings Interface
The settings panel displays:
- Model selector dropdown
- Temperature slider (0.0 - 2.0)
- Max tokens input field
- Advanced settings (expandable)
- Presence penalty
- Frequency penalty
- Top P (optional)
Real-Time Testing
After adjusting settings, test your changes:
- Open a swarm with the agent
- Send test messages
- Evaluate response quality
- Iterate on settings as needed
Troubleshooting Settings Issues
Responses Are Cut Off
Cause: Max tokens set too low
Solution: Increase max tokens to accommodate expected response length
Responses Are Too Similar
Cause: Temperature too low
Solution: Increase temperature to 0.5-0.7 for more variety
Responses Are Incoherent
Cause: Temperature too high
Solution: Reduce temperature to 0.7 or below
Responses Are Repetitive
Cause: Penalties not configured
Solution: Increase presence penalty (0.3-0.5) and/or frequency penalty (0.2-0.4)
Responses Feel Unnatural
Cause: Penalties too high
Solution: Reduce presence and frequency penalties closer to 0
Context Errors / Truncation
Cause: Exceeding model context window
Solution: Reduce system prompt length, summarize conversation history, or switch to higher-context model
Next Steps
Now that you understand agent settings:
- [Agent Tools](/docs/agents/agent-tools): Add external capabilities to your agents
- [Creating Swarms](/docs/swarms/creating-swarms): Configure multi-agent collaboration
- [Usage Tracking](/docs/models/usage-tracking): Monitor costs and optimize spending