Token Management

Last updated: Jan 2026

Overview

Tokens are the building blocks of AI model interactions. Understanding how tokens work helps you optimize performance, stay within limits, and control costs.

Key concepts to understand: Input Tokens (what you send to the model), Output Tokens (what the model generates), Context Limit (maximum total tokens), and Cost (credits charged per token).

Understanding Tokens

Tokens are pieces of text that models process. They're not exactly words - a token might be a word, part of a word, or punctuation.

Token Examples

Simple word:"hello"= 1 token
Long word:"authentication"= 2-3 tokens
Sentence:"Hello, world!"= 4 tokens
Code:function() {}= 5+ tokens

Rule of Thumb

For English text: ~4 characters = 1 token, or ~0.75 words = 1 token. Code and non-English text often use more tokens per word.

Context Limits

Each model has a maximum context window - the total tokens for input plus output combined.

ModelContext WindowMax Output
Claude Opus 4.5200K tokens4K tokens
Claude Sonnet 4.5 / 4200K tokens4K tokens
Claude Haiku 4.5200K tokens4K tokens
GPT-5.2 / GPT-5 Mini128K tokens16K tokens
GPT-4o / GPT-4o Mini128K tokens4K tokens
o3 / o4-mini (reasoning)128K tokens16K tokens
Gemini 2.5 Pro1M tokens8K tokens
Gemini 2.5 / 2.0 Flash1M tokens8K tokens

Context Overflow

If input + output exceeds the context window, the request will fail. Monitor your token usage and truncate long inputs if needed.

Optimization Strategies

Reduce token usage without sacrificing quality with these strategies.

  • Concise prompts: Remove redundant words. "Summarize this" not "I would like you to please summarize".
  • Truncate long inputs: For long documents, extract relevant sections before sending.
  • Use smaller models: Simple tasks don't need large models. Haiku 4.5 or GPT-4o Mini are efficient choices.
Before & After Optimization
# Before (verbose) - 45 tokens
I would like you to please take a look at the following
customer email and provide me with a comprehensive summary
of what the customer is asking about.

# After (concise) - 15 tokens
Summarize this customer email in 2 sentences:

# Savings: 30 tokens per request = 67% reduction

Truncation Tips

  • Keep first and last paragraphs (often contain key info)
  • Remove boilerplate (headers, footers, signatures)
  • Extract only relevant sections for the task
  • Summarize long sections before detailed analysis

Monitoring Usage

Track token usage to understand costs and optimize workflows.

  • Execution Details: Each AI node execution shows input tokens, output tokens, and total credit cost in the execution pane.
  • Usage Dashboard: View aggregated token usage by workflow, time period, and model in the analytics dashboard.

Cost Control

Implement these controls to manage AI spending effectively.

  • Use smaller models (Haiku 4.5, GPT-4o Mini) for simple tasks
  • Review usage reports weekly to identify optimization opportunities
  • Test with small samples before processing large datasets

Quick Wins

The biggest cost savings usually come from: (1) using smaller models where appropriate, (2) reducing prompt verbosity, and (3) reducing tool calls.

Key Takeaways

Tokens are text pieces - roughly 4 characters or 0.75 words each.

Input + output must fit within the model's context window.

Write concise prompts to reduce usage.

Monitor token usage in execution details and dashboards.

Use smaller models for simple tasks to reduce costs.