HIVE

Agent Configuration

Agent Settings

Agent Settings

Agent settings provide fine-grained control over how your AI agents generate responses. Understanding and properly configuring these settings is crucial for achieving consistent, high-quality outputs tailored to your specific use cases. This guide covers everything from basic model selection to advanced parameters that most users never touch.

Whether you're optimizing for cost, quality, speed, or a balance of all three, mastering these settings will help you create more effective agents.

Model Selection: Choosing the Right Brain

The model you select determines your agent's fundamental capabilities, cost, and response characteristics. Each provider offers models optimized for different use cases.

Understanding Model Trade-offs

┌─────────────────────────────────────────────────────────────┐
│                     MODEL SELECTION MATRIX                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  QUALITY                                                    │
│    ▲                                                        │
│    │    ┌─────────────┐                                    │
│    │    │ GPT-4o      │   ┌─────────────┐                 │
│    │    │ Claude Opus │   │ Claude      │                 │
│    │    └─────────────┘   │ Sonnet      │                 │
│    │                      │ Gemini Pro  │                 │
│    │                      └─────────────┘                 │
│    │           ┌─────────────┐                             │
│    │           │ GPT-4o-mini │                             │
│    │           │ Gemini Flash│                             │
│    │           └─────────────┘                             │
│    │    ┌─────────────┐                                    │
│    │    │ GPT-3.5     │                                    │
│    │    │ Claude Haiku│                                    │
│    │    └─────────────┘                                    │
│    │                                                        │
│    └─────────────────────────────────────────────────►     │
│                                              SPEED/COST     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

OpenAI Models

OpenAI offers the most widely-used models with excellent documentation and consistent behavior.

ModelContextSpeedCostStrengthsBest Use Cases
gpt-4o128KMedium$$$$Strongest reasoning, multimodalComplex analysis, nuanced tasks
gpt-4o-mini128KFast$$Great balance of quality/costMost general tasks
gpt-4-turbo128KMedium$$$Long context, good for docsDocument processing
gpt-3.5-turbo16KVery Fast$Quick, cost-effectiveSimple tasks, high volume

When to Choose OpenAI:

  • You need reliable, well-documented behavior
  • Code generation is a primary use case
  • You want the largest ecosystem of examples and tutorials
  • Consistent performance is more important than peak capability

Pricing Insight (approximate):

GPT-4o:        $5.00 / 1M input tokens, $15.00 / 1M output tokens
GPT-4o-mini:   $0.15 / 1M input tokens, $0.60 / 1M output tokens
GPT-3.5-turbo: $0.50 / 1M input tokens, $1.50 / 1M output tokens

Anthropic Models

Anthropic's Claude models excel at nuanced understanding, long-form content, and following complex instructions.

ModelContextSpeedCostStrengthsBest Use Cases
claude-sonnet-4-20250514200KMedium$$$Latest, best balancedMost tasks
claude-3-5-sonnet200KMedium$$$Excellent writingContent, analysis
claude-3-opus200KSlow$$$$Highest qualityComplex reasoning
claude-3-haiku200KVery Fast$Quick responsesSimple queries, high volume

When to Choose Anthropic:

  • Long-form writing and content creation
  • Nuanced, conversational interactions
  • Tasks requiring careful instruction following
  • You need large context windows (200K tokens)

Unique Claude Features:

  • Larger standard context window than OpenAI
  • Excellent at maintaining consistent persona
  • Strong at admitting uncertainty
  • Good at following complex, multi-part instructions

Google AI Models

Google's Gemini models offer exceptional context windows and multimodal capabilities.

ModelContextSpeedCostStrengthsBest Use Cases
gemini-1.5-pro1MMedium$$$Massive contextLarge documents, codebases
gemini-1.5-flash1MVery Fast$$Speed + contextQuick analysis of large inputs

When to Choose Google AI:

  • Processing very large documents (legal contracts, codebases)
  • Analyzing long conversation histories
  • Multimodal tasks (text + images)
  • Cost-effective processing of large inputs

Context Window Advantage:

OpenAI GPT-4o:    128,000 tokens  (~96,000 words)
Anthropic Claude: 200,000 tokens  (~150,000 words)
Google Gemini:  1,000,000 tokens  (~750,000 words)

Model Selection Decision Tree

START
  │
  ├─► Need to process very large documents (100K+ tokens)?
  │     └─► YES: Use Gemini 1.5 Pro
  │
  ├─► Primary use case is creative writing or nuanced content?
  │     └─► YES: Use Claude Sonnet or Opus
  │
  ├─► Need maximum reasoning capability for complex analysis?
  │     └─► YES: Use GPT-4o or Claude Opus
  │
  ├─► High-volume, simple tasks where cost matters most?
  │     └─► YES: Use GPT-3.5-turbo or Claude Haiku
  │
  ├─► General-purpose with good quality/cost balance?
  │     └─► YES: Use GPT-4o-mini or Claude Sonnet
  │
  └─► Default recommendation: GPT-4o-mini (best all-around value)

Temperature: Controlling Randomness

Temperature is the most important setting for controlling response characteristics. It determines how "creative" or "deterministic" the model's outputs will be.

How Temperature Works

At a technical level, temperature affects the probability distribution when the model selects the next token:

Low Temperature (0.0 - 0.3):
  • Model strongly favors highest-probability tokens
  • Outputs are predictable and consistent
  • Same input → nearly identical output each time

High Temperature (0.8 - 1.5):
  • Model considers lower-probability tokens more often
  • Outputs are varied and creative
  • Same input → different outputs each time

Temperature Visualization

Temperature Scale and Effects:

0.0 ──────────────────────────────────────────────────────► 2.0
 │                    │                    │                │
 │                    │                    │                │
DETERMINISTIC    BALANCED           CREATIVE        CHAOTIC

• Factual queries    • General chat      • Brainstorming   • Experimental
• Code generation    • Explanations      • Creative writing • May be
• Data extraction    • Q&A               • Storytelling      incoherent
• Math problems      • Most tasks        • Poetry          • Unpredictable
                                         • Ideation

Recommended:        Recommended:        Recommended:        Caution:
0.0 - 0.2          0.3 - 0.5           0.6 - 1.0          1.0+

Temperature by Task Type

Task TypeRecommended TemperatureWhy
Code generation0.0 - 0.2Code must be syntactically correct
Data extraction0.0 - 0.1Accuracy is critical
Technical documentation0.2 - 0.3Consistency matters, slight variety OK
Customer support0.3 - 0.4Helpful but consistent responses
General Q&A0.4 - 0.6Balance of accuracy and natural flow
Content writing0.6 - 0.8Creative but coherent
Brainstorming0.8 - 1.0Maximum idea variety
Poetry/creative0.9 - 1.2Unique, unexpected outputs

Temperature Examples

Temperature 0.1 - Factual Query:

Prompt: "What is the capital of France?"

Response: "Paris is the capital of France."

(Response will be nearly identical every time)

Temperature 0.7 - General Explanation:

Prompt: "Explain why the sky is blue"

Response 1: "The sky appears blue due to a phenomenon called
Rayleigh scattering. When sunlight enters Earth's atmosphere..."

Response 2: "When sunlight travels through our atmosphere, it
collides with gas molecules. Blue light, having a shorter
wavelength, scatters more than other colors..."

(Responses vary in structure and wording but remain accurate)

Temperature 1.0 - Creative Writing:

Prompt: "Write the opening line of a mystery novel"

Response 1: "The letter arrived on a Tuesday, postmarked from
a town that hadn't existed in thirty years."

Response 2: "Detective Mills had seen a lot of strange things
in her career, but nothing prepared her for the empty coffin."

Response 3: "Rain drummed against the window as Sarah realized
the photograph couldn't possibly be real."

(Each response is unique and creative)

Token Limits: Managing Response Length

Tokens are the currency of AI models—they determine how much content can be processed and generated.

Understanding Tokens

Token Basics:

• 1 token ≈ 4 characters in English
• 1 token ≈ 0.75 words
• 100 tokens ≈ 75 words
• 1,000 tokens ≈ 750 words

Examples:
"Hello" = 1 token
"Hello, how are you today?" = 6 tokens
"Supercalifragilisticexpialidocious" = 9 tokens (unusual words = more tokens)

Max Tokens Setting

The max_tokens parameter limits how long the agent's response can be:

SettingApprox. WordsUse Case
150~110 wordsTweet-length responses
256~190 wordsBrief answers
512~380 wordsShort paragraphs
1024~750 wordsStandard responses
2048~1,500 wordsDetailed explanations
4096~3,000 wordsLong-form content
8192~6,000 wordsArticles, documentation
16384~12,000 wordsComprehensive reports

Important: Setting max_tokens too low can cause responses to be cut off mid-sentence. Set it higher than your expected response length.

Context Window Management

The context window includes everything the model "sees":

┌─────────────────────────────────────────────────────────────┐
│                    CONTEXT WINDOW BUDGET                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  System Prompt         [████████░░░░░░░░░░░░]  ~2,000 tokens │
│  Context Blocks        [████░░░░░░░░░░░░░░░░]  ~1,000 tokens │
│  Conversation History  [████████████████░░░░]  ~8,000 tokens │
│  Reserved for Response [████████░░░░░░░░░░░░]  ~4,000 tokens │
│                        ─────────────────────                 │
│  TOTAL USED:                                  ~15,000 tokens │
│                                                             │
│  Model: GPT-4o (128K context)                               │
│  Available for more history: ~113,000 tokens                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Best Practices:

  • Monitor token usage in long conversations
  • Summarize or truncate old messages when approaching limits
  • Keep system prompts concise but effective
  • Consider model context size when planning workflows

Advanced Parameters

Presence Penalty

Presence penalty reduces the likelihood of the model repeating topics it has already mentioned. It applies equally to all tokens that have appeared at least once.

Presence Penalty Range: 0.0 to 2.0

0.0 → No penalty (default)
      Model may return to same topics naturally

0.5 → Mild penalty
      Slight encouragement to explore new topics

1.0 → Moderate penalty
      Noticeable shift toward new topics

2.0 → Strong penalty
      Aggressively avoids returning to mentioned topics
      Warning: May make responses feel disconnected

When to Increase Presence Penalty:

  • Brainstorming sessions where you want diverse ideas
  • Content generation needing broad topic coverage
  • Conversations that feel stuck on the same points

When to Keep Low (0.0):

  • Technical explanations that need to reference key terms
  • Customer support requiring consistent terminology
  • Code generation

Frequency Penalty

Frequency penalty reduces repetition of specific words proportional to how many times they've appeared.

Frequency Penalty Range: 0.0 to 2.0

0.0 → No penalty (default)
      Natural word repetition allowed

0.5 → Mild penalty
      Encourages vocabulary variety

1.0 → Moderate penalty
      Noticeably more varied word choice

2.0 → Strong penalty
      Strongly avoids repeating words
      Warning: May produce awkward phrasing

Frequency vs. Presence:

Presence Penalty:  "Have you mentioned this topic at all?"
                   Binary: mentioned or not

Frequency Penalty: "How many times have you used this word?"
                   Proportional: penalty increases with repetition

When to Increase Frequency Penalty:

  • Creative writing where varied vocabulary matters
  • Marketing copy that needs fresh language
  • Content that feels repetitive

When to Keep Low (0.0):

  • Technical writing with necessary terminology
  • Code generation with repeated patterns
  • Instructional content with key terms

Top P (Nucleus Sampling)

Top P provides an alternative to temperature for controlling randomness. It limits token selection to a probability mass.

Top P Range: 0.0 to 1.0

1.0 → Consider all tokens (default)
0.9 → Consider tokens in top 90% probability mass
0.5 → Consider only most likely tokens
0.1 → Very restrictive, almost deterministic

Temperature vs. Top P:

  • Temperature scales the entire probability distribution
  • Top P truncates it, removing low-probability options

Recommendation: Use either temperature OR top P, not both. Temperature is more intuitive for most users.

Configuration Profiles

By Use Case

Customer Support Agent

Model: gpt-4o-mini
Temperature: 0.3
Max Tokens: 1024
Presence Penalty: 0.0
Frequency Penalty: 0.0

Rationale:
- Consistent, helpful responses
- Not too creative (could lead to incorrect info)
- Long enough for detailed answers
- Natural language, don't need forced variety

Technical Documentation Writer

Model: claude-sonnet-4-20250514
Temperature: 0.2
Max Tokens: 4096
Presence Penalty: 0.0
Frequency Penalty: 0.1

Rationale:
- Accurate, consistent technical content
- Long enough for complete documentation sections
- Slight variety to avoid robotic repetition
- Claude excels at structured writing

Creative Writing Assistant

Model: claude-sonnet-4-20250514
Temperature: 0.9
Max Tokens: 4096
Presence Penalty: 0.6
Frequency Penalty: 0.4

Rationale:
- High creativity for unique content
- Explores diverse topics and ideas
- Varied vocabulary for engaging prose
- Claude's strength in creative work

Code Review Agent

Model: gpt-4o
Temperature: 0.1
Max Tokens: 8192
Presence Penalty: 0.0
Frequency Penalty: 0.0

Rationale:
- Deterministic, consistent analysis
- Long enough for detailed code reviews
- Needs to reference same concepts/code repeatedly
- GPT-4o strong at code analysis

Research Synthesizer

Model: gemini-1.5-pro
Temperature: 0.4
Max Tokens: 16384
Presence Penalty: 0.3
Frequency Penalty: 0.0

Rationale:
- Process large amounts of research
- Balanced creativity for synthesis
- Encourages covering diverse aspects
- Massive context for source material

Brainstorming Partner

Model: gpt-4o
Temperature: 1.0
Max Tokens: 2048
Presence Penalty: 0.8
Frequency Penalty: 0.5

Rationale:
- Maximum idea variety
- Strongly explores new directions
- Fresh language and perspectives
- Not too long—rapid ideation

Settings in the Dashboard

Accessing Agent Settings

  1. Navigate to Agents in the sidebar
  2. Click on the agent you want to configure
  3. Select the Settings tab
  4. Modify parameters as needed
  5. Click Save Changes

Settings Interface

The settings panel displays:

  • Model selector dropdown
  • Temperature slider (0.0 - 2.0)
  • Max tokens input field
  • Advanced settings (expandable)
  • Presence penalty
  • Frequency penalty
  • Top P (optional)

Real-Time Testing

After adjusting settings, test your changes:

  1. Open a swarm with the agent
  2. Send test messages
  3. Evaluate response quality
  4. Iterate on settings as needed

Troubleshooting Settings Issues

Responses Are Cut Off

Cause: Max tokens set too low

Solution: Increase max tokens to accommodate expected response length

Responses Are Too Similar

Cause: Temperature too low

Solution: Increase temperature to 0.5-0.7 for more variety

Responses Are Incoherent

Cause: Temperature too high

Solution: Reduce temperature to 0.7 or below

Responses Are Repetitive

Cause: Penalties not configured

Solution: Increase presence penalty (0.3-0.5) and/or frequency penalty (0.2-0.4)

Responses Feel Unnatural

Cause: Penalties too high

Solution: Reduce presence and frequency penalties closer to 0

Context Errors / Truncation

Cause: Exceeding model context window

Solution: Reduce system prompt length, summarize conversation history, or switch to higher-context model

Next Steps

Now that you understand agent settings:

  • [Agent Tools](/docs/agents/agent-tools): Add external capabilities to your agents
  • [Creating Swarms](/docs/swarms/creating-swarms): Configure multi-agent collaboration
  • [Usage Tracking](/docs/models/usage-tracking): Monitor costs and optimize spending

Cookie Preferences

We use cookies to enhance your experience, analyze site traffic, and for marketing purposes. By clicking "Accept All", you consent to our use of cookies. Read our Privacy Policy for more information.