Agent Settings

Agent settings provide fine-grained control over how your AI agents generate responses. Understanding and properly configuring these settings is crucial for achieving consistent, high-quality outputs tailored to your specific use cases. This guide covers everything from basic model selection to advanced parameters that most users never touch.

Whether you're optimizing for cost, quality, speed, or a balance of all three, mastering these settings will help you create more effective agents.

Model Selection: Choosing the Right Brain

The model you select determines your agent's fundamental capabilities, cost, and response characteristics. Each provider offers models optimized for different use cases.

Understanding Model Trade-offs

┌─────────────────────────────────────────────────────────────┐
│                     MODEL SELECTION MATRIX                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  QUALITY                                                    │
│    ▲                                                        │
│    │    ┌─────────────┐                                    │
│    │    │ GPT-4o      │   ┌─────────────┐                 │
│    │    │ Claude Opus │   │ Claude      │                 │
│    │    └─────────────┘   │ Sonnet      │                 │
│    │                      │ Gemini Pro  │                 │
│    │                      └─────────────┘                 │
│    │           ┌─────────────┐                             │
│    │           │ GPT-4o-mini │                             │
│    │           │ Gemini Flash│                             │
│    │           └─────────────┘                             │
│    │    ┌─────────────┐                                    │
│    │    │ GPT-3.5     │                                    │
│    │    │ Claude Haiku│                                    │
│    │    └─────────────┘                                    │
│    │                                                        │
│    └─────────────────────────────────────────────────►     │
│                                              SPEED/COST     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

OpenAI Models

OpenAI offers the most widely-used models with excellent documentation and consistent behavior.

Model	Context	Speed	Cost	Strengths	Best Use Cases
gpt-4o	128K	Medium	$$$$	Strongest reasoning, multimodal	Complex analysis, nuanced tasks
gpt-4o-mini	128K	Fast	$$	Great balance of quality/cost	Most general tasks
gpt-4-turbo	128K	Medium	$$$	Long context, good for docs	Document processing
gpt-3.5-turbo	16K	Very Fast	$	Quick, cost-effective	Simple tasks, high volume

When to Choose OpenAI:

You need reliable, well-documented behavior
Code generation is a primary use case
You want the largest ecosystem of examples and tutorials
Consistent performance is more important than peak capability

Pricing Insight (approximate):

GPT-4o:        $5.00 / 1M input tokens, $15.00 / 1M output tokens
GPT-4o-mini:   $0.15 / 1M input tokens, $0.60 / 1M output tokens
GPT-3.5-turbo: $0.50 / 1M input tokens, $1.50 / 1M output tokens

Anthropic Models

Anthropic's Claude models excel at nuanced understanding, long-form content, and following complex instructions.

Model	Context	Speed	Cost	Strengths	Best Use Cases
claude-sonnet-4-20250514	200K	Medium	$$$	Latest, best balanced	Most tasks
claude-3-5-sonnet	200K	Medium	$$$	Excellent writing	Content, analysis
claude-3-opus	200K	Slow	$$$$	Highest quality	Complex reasoning
claude-3-haiku	200K	Very Fast	$	Quick responses	Simple queries, high volume

When to Choose Anthropic:

Long-form writing and content creation
Nuanced, conversational interactions
Tasks requiring careful instruction following
You need large context windows (200K tokens)

Unique Claude Features:

Larger standard context window than OpenAI
Excellent at maintaining consistent persona
Strong at admitting uncertainty
Good at following complex, multi-part instructions

Google AI Models

Google's Gemini models offer exceptional context windows and multimodal capabilities.

Model	Context	Speed	Cost	Strengths	Best Use Cases
gemini-1.5-pro	1M	Medium	$$$	Massive context	Large documents, codebases
gemini-1.5-flash	1M	Very Fast	$$	Speed + context	Quick analysis of large inputs

When to Choose Google AI:

Processing very large documents (legal contracts, codebases)
Analyzing long conversation histories
Multimodal tasks (text + images)
Cost-effective processing of large inputs

Context Window Advantage:

OpenAI GPT-4o:    128,000 tokens  (~96,000 words)
Anthropic Claude: 200,000 tokens  (~150,000 words)
Google Gemini:  1,000,000 tokens  (~750,000 words)

Model Selection Decision Tree

START
  │
  ├─► Need to process very large documents (100K+ tokens)?
  │     └─► YES: Use Gemini 1.5 Pro
  │
  ├─► Primary use case is creative writing or nuanced content?
  │     └─► YES: Use Claude Sonnet or Opus
  │
  ├─► Need maximum reasoning capability for complex analysis?
  │     └─► YES: Use GPT-4o or Claude Opus
  │
  ├─► High-volume, simple tasks where cost matters most?
  │     └─► YES: Use GPT-3.5-turbo or Claude Haiku
  │
  ├─► General-purpose with good quality/cost balance?
  │     └─► YES: Use GPT-4o-mini or Claude Sonnet
  │
  └─► Default recommendation: GPT-4o-mini (best all-around value)

Temperature: Controlling Randomness

Temperature is the most important setting for controlling response characteristics. It determines how "creative" or "deterministic" the model's outputs will be.

How Temperature Works

At a technical level, temperature affects the probability distribution when the model selects the next token:

Low Temperature (0.0 - 0.3):
  • Model strongly favors highest-probability tokens
  • Outputs are predictable and consistent
  • Same input → nearly identical output each time

High Temperature (0.8 - 1.5):
  • Model considers lower-probability tokens more often
  • Outputs are varied and creative
  • Same input → different outputs each time

Temperature Visualization

Temperature Scale and Effects:

0.0 ──────────────────────────────────────────────────────► 2.0
 │                    │                    │                │
 │                    │                    │                │
DETERMINISTIC    BALANCED           CREATIVE        CHAOTIC

• Factual queries    • General chat      • Brainstorming   • Experimental
• Code generation    • Explanations      • Creative writing • May be
• Data extraction    • Q&A               • Storytelling      incoherent
• Math problems      • Most tasks        • Poetry          • Unpredictable
                                         • Ideation

Recommended:        Recommended:        Recommended:        Caution:
0.0 - 0.2          0.3 - 0.5           0.6 - 1.0          1.0+

Temperature by Task Type

Task Type	Recommended Temperature	Why
Code generation	0.0 - 0.2	Code must be syntactically correct
Data extraction	0.0 - 0.1	Accuracy is critical
Technical documentation	0.2 - 0.3	Consistency matters, slight variety OK
Customer support	0.3 - 0.4	Helpful but consistent responses
General Q&A	0.4 - 0.6	Balance of accuracy and natural flow
Content writing	0.6 - 0.8	Creative but coherent
Brainstorming	0.8 - 1.0	Maximum idea variety
Poetry/creative	0.9 - 1.2	Unique, unexpected outputs

Temperature Examples

Temperature 0.1 - Factual Query:

Prompt: "What is the capital of France?"

Response: "Paris is the capital of France."

(Response will be nearly identical every time)

Temperature 0.7 - General Explanation:

Prompt: "Explain why the sky is blue"

Response 1: "The sky appears blue due to a phenomenon called
Rayleigh scattering. When sunlight enters Earth's atmosphere..."

Response 2: "When sunlight travels through our atmosphere, it
collides with gas molecules. Blue light, having a shorter
wavelength, scatters more than other colors..."

(Responses vary in structure and wording but remain accurate)

Temperature 1.0 - Creative Writing:

Prompt: "Write the opening line of a mystery novel"

Response 1: "The letter arrived on a Tuesday, postmarked from
a town that hadn't existed in thirty years."

Response 2: "Detective Mills had seen a lot of strange things
in her career, but nothing prepared her for the empty coffin."

Response 3: "Rain drummed against the window as Sarah realized
the photograph couldn't possibly be real."

(Each response is unique and creative)

Token Limits: Managing Response Length

Tokens are the currency of AI models—they determine how much content can be processed and generated.

Understanding Tokens

Token Basics:

• 1 token ≈ 4 characters in English
• 1 token ≈ 0.75 words
• 100 tokens ≈ 75 words
• 1,000 tokens ≈ 750 words

Examples:
"Hello" = 1 token
"Hello, how are you today?" = 6 tokens
"Supercalifragilisticexpialidocious" = 9 tokens (unusual words = more tokens)

Max Tokens Setting

The max_tokens parameter limits how long the agent's response can be:

Setting	Approx. Words	Use Case
150	~110 words	Tweet-length responses
256	~190 words	Brief answers
512	~380 words	Short paragraphs
1024	~750 words	Standard responses
2048	~1,500 words	Detailed explanations
4096	~3,000 words	Long-form content
8192	~6,000 words	Articles, documentation
16384	~12,000 words	Comprehensive reports

Important: Setting max_tokens too low can cause responses to be cut off mid-sentence. Set it higher than your expected response length.

Context Window Management

The context window includes everything the model "sees":

┌─────────────────────────────────────────────────────────────┐
│                    CONTEXT WINDOW BUDGET                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  System Prompt         [████████░░░░░░░░░░░░]  ~2,000 tokens │
│  Context Blocks        [████░░░░░░░░░░░░░░░░]  ~1,000 tokens │
│  Conversation History  [████████████████░░░░]  ~8,000 tokens │
│  Reserved for Response [████████░░░░░░░░░░░░]  ~4,000 tokens │
│                        ─────────────────────                 │
│  TOTAL USED:                                  ~15,000 tokens │
│                                                             │
│  Model: GPT-4o (128K context)                               │
│  Available for more history: ~113,000 tokens                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Best Practices:

Monitor token usage in long conversations
Summarize or truncate old messages when approaching limits
Keep system prompts concise but effective
Consider model context size when planning workflows

Advanced Parameters

Presence Penalty

Presence penalty reduces the likelihood of the model repeating topics it has already mentioned. It applies equally to all tokens that have appeared at least once.

Presence Penalty Range: 0.0 to 2.0

0.0 → No penalty (default)
      Model may return to same topics naturally

0.5 → Mild penalty
      Slight encouragement to explore new topics

1.0 → Moderate penalty
      Noticeable shift toward new topics

2.0 → Strong penalty
      Aggressively avoids returning to mentioned topics
      Warning: May make responses feel disconnected

When to Increase Presence Penalty:

Brainstorming sessions where you want diverse ideas
Content generation needing broad topic coverage
Conversations that feel stuck on the same points

When to Keep Low (0.0):

Technical explanations that need to reference key terms
Customer support requiring consistent terminology
Code generation

Frequency Penalty

Frequency penalty reduces repetition of specific words proportional to how many times they've appeared.

Frequency Penalty Range: 0.0 to 2.0

0.0 → No penalty (default)
      Natural word repetition allowed

0.5 → Mild penalty
      Encourages vocabulary variety

1.0 → Moderate penalty
      Noticeably more varied word choice

2.0 → Strong penalty
      Strongly avoids repeating words
      Warning: May produce awkward phrasing

Frequency vs. Presence:

Presence Penalty:  "Have you mentioned this topic at all?"
                   Binary: mentioned or not

Frequency Penalty: "How many times have you used this word?"
                   Proportional: penalty increases with repetition

When to Increase Frequency Penalty:

Creative writing where varied vocabulary matters
Marketing copy that needs fresh language
Content that feels repetitive

When to Keep Low (0.0):

Technical writing with necessary terminology
Code generation with repeated patterns
Instructional content with key terms

Top P (Nucleus Sampling)

Top P provides an alternative to temperature for controlling randomness. It limits token selection to a probability mass.

Top P Range: 0.0 to 1.0

1.0 → Consider all tokens (default)
0.9 → Consider tokens in top 90% probability mass
0.5 → Consider only most likely tokens
0.1 → Very restrictive, almost deterministic

Temperature vs. Top P:

Temperature scales the entire probability distribution
Top P truncates it, removing low-probability options

Recommendation: Use either temperature OR top P, not both. Temperature is more intuitive for most users.

Configuration Profiles

By Use Case

Customer Support Agent

Model: gpt-4o-mini
Temperature: 0.3
Max Tokens: 1024
Presence Penalty: 0.0
Frequency Penalty: 0.0

Rationale:
- Consistent, helpful responses
- Not too creative (could lead to incorrect info)
- Long enough for detailed answers
- Natural language, don't need forced variety

Technical Documentation Writer

Model: claude-sonnet-4-20250514
Temperature: 0.2
Max Tokens: 4096
Presence Penalty: 0.0
Frequency Penalty: 0.1

Rationale:
- Accurate, consistent technical content
- Long enough for complete documentation sections
- Slight variety to avoid robotic repetition
- Claude excels at structured writing

Creative Writing Assistant

Model: claude-sonnet-4-20250514
Temperature: 0.9
Max Tokens: 4096
Presence Penalty: 0.6
Frequency Penalty: 0.4

Rationale:
- High creativity for unique content
- Explores diverse topics and ideas
- Varied vocabulary for engaging prose
- Claude's strength in creative work

Code Review Agent

Model: gpt-4o
Temperature: 0.1
Max Tokens: 8192
Presence Penalty: 0.0
Frequency Penalty: 0.0

Rationale:
- Deterministic, consistent analysis
- Long enough for detailed code reviews
- Needs to reference same concepts/code repeatedly
- GPT-4o strong at code analysis

Research Synthesizer

Model: gemini-1.5-pro
Temperature: 0.4
Max Tokens: 16384
Presence Penalty: 0.3
Frequency Penalty: 0.0

Rationale:
- Process large amounts of research
- Balanced creativity for synthesis
- Encourages covering diverse aspects
- Massive context for source material

Brainstorming Partner

Model: gpt-4o
Temperature: 1.0
Max Tokens: 2048
Presence Penalty: 0.8
Frequency Penalty: 0.5

Rationale:
- Maximum idea variety
- Strongly explores new directions
- Fresh language and perspectives
- Not too long—rapid ideation

Settings in the Dashboard

Accessing Agent Settings

Navigate to Agents in the sidebar
Click on the agent you want to configure
Select the Settings tab
Modify parameters as needed
Click Save Changes

Settings Interface

The settings panel displays:

Model selector dropdown
Temperature slider (0.0 - 2.0)
Max tokens input field
Advanced settings (expandable)
Presence penalty
Frequency penalty
Top P (optional)

Real-Time Testing

After adjusting settings, test your changes:

Open a swarm with the agent
Send test messages
Evaluate response quality
Iterate on settings as needed

Troubleshooting Settings Issues

Responses Are Cut Off

Cause: Max tokens set too low

Solution: Increase max tokens to accommodate expected response length

Responses Are Too Similar

Cause: Temperature too low

Solution: Increase temperature to 0.5-0.7 for more variety

Responses Are Incoherent

Cause: Temperature too high

Solution: Reduce temperature to 0.7 or below

Responses Are Repetitive

Cause: Penalties not configured

Solution: Increase presence penalty (0.3-0.5) and/or frequency penalty (0.2-0.4)

Responses Feel Unnatural

Cause: Penalties too high

Solution: Reduce presence and frequency penalties closer to 0

Context Errors / Truncation

Cause: Exceeding model context window

Solution: Reduce system prompt length, summarize conversation history, or switch to higher-context model

Next Steps

Now that you understand agent settings:

[Agent Tools](/docs/agents/agent-tools): Add external capabilities to your agents
[Creating Swarms](/docs/swarms/creating-swarms): Configure multi-agent collaboration
[Usage Tracking](/docs/models/usage-tracking): Monitor costs and optimize spending

Agent Settings

Agent Settings

Model Selection: Choosing the Right Brain

Understanding Model Trade-offs

OpenAI Models

Anthropic Models

Google AI Models

Model Selection Decision Tree

Temperature: Controlling Randomness

How Temperature Works

Temperature Visualization

Temperature by Task Type

Temperature Examples

Token Limits: Managing Response Length

Understanding Tokens

Max Tokens Setting

Context Window Management

Advanced Parameters

Presence Penalty

Frequency Penalty

Top P (Nucleus Sampling)

Configuration Profiles

By Use Case

Customer Support Agent

Technical Documentation Writer

Creative Writing Assistant

Code Review Agent

Research Synthesizer

Brainstorming Partner

Settings in the Dashboard

Accessing Agent Settings

Settings Interface

Real-Time Testing

Troubleshooting Settings Issues

Responses Are Cut Off

Responses Are Too Similar

Responses Are Incoherent

Responses Are Repetitive

Responses Feel Unnatural

Context Errors / Truncation

Next Steps

Cookie Preferences