Context Management

LLMs have finite context windows. Long conversations eventually exceed the limit, causing truncation or API errors. Chronos provides automatic context window management with rolling summarization so multi-turn sessions stay within budget without losing important context.

The Problem

Each model has a maximum context size (e.g., 128K tokens for GPT-4o). As a conversation grows:

System prompts, instructions, memories, and knowledge consume tokens
Each user and assistant message adds more
Tool calls and results add further overhead
Eventually the combined size exceeds the limit

Without management, the API rejects the request or truncates history, losing earlier context.

The Solution

Chronos automatically detects when a session approaches its context limit and triggers rolling summarization. Older messages are compressed into a concise summary; recent turns are preserved intact so the model retains immediate conversational context. The prior summary is incorporated into each new summary so nothing is lost across multiple summarization passes.

TokenCounter Interface

Token counting is abstracted behind the TokenCounter interface in engine/model:

type TokenCounter interface {
    CountTokens(messages []Message) int
    CountString(s string) int
}

CountTokens returns the estimated token count for a slice of messages, including per-message overhead (role, separators) and tool calls.
CountString returns the estimated token count for a raw string.

EstimatingCounter

The default implementation uses a character-ratio heuristic: approximately 4 characters per token. This avoids external dependencies (e.g., tiktoken) and works across models.

counter := model.NewEstimatingCounter()

// Estimate tokens for a string
tokens := counter.CountString("Hello, world!")
// tokens ≈ 4 (12 chars / 4)

// Estimate tokens for messages
msgs := []model.Message{
    {Role: model.RoleUser, Content: "What is the capital of France?"},
}
total := counter.CountTokens(msgs)

You can customize the ratio:

counter := &model.EstimatingCounter{CharsPerToken: 3.5}

Model Context Limits Registry

Chronos maintains built-in context limits for well-known models. Use ContextLimit to resolve the limit for a given model:

limit := model.ContextLimit("gpt-4o", 0)      // 128000
limit := model.ContextLimit("claude-3-5-sonnet", 0)  // 200000
limit := model.ContextLimit("gemini-1.5-pro", 0)      // 2097152
limit := model.ContextLimit("unknown-model", 8192)   // fallback 8192

Built-in limits include:

Provider	Models	Limit (tokens)
OpenAI	gpt-4o, gpt-4o-mini, gpt-4-turbo	128K
OpenAI	o1, o3, o3-mini, o4-mini	200K
Anthropic	claude-sonnet-4-6, claude-3-5-sonnet, claude-3-opus, claude-3-haiku	200K
Google	gemini-2.0-flash, gemini-2.0-pro, gemini-1.5-flash	1M
Google	gemini-1.5-pro	2M
Mistral	mistral-large-latest	128K
Meta	llama3.3, llama3.2, llama3.1	131K
DeepSeek	deepseek-chat, deepseek-coder, deepseek-reasoner	64K

Unknown models use the provided fallback, or 8192 if fallback is 0.

ContextConfig

ContextConfig controls context window behavior on the agent:

Field	Type	Description	Default
`MaxContextTokens`	int	Override model default; 0 = use model default	0
`SummarizeThreshold`	float64	Fraction of context window that triggers summarization	0.8
`PreserveRecentTurns`	int	Number of recent user/assistant pairs to keep	5

agent.WithContextConfig(agent.ContextConfig{
    MaxContextTokens:    128000,
    SummarizeThreshold:  0.8,
    PreserveRecentTurns: 5,
})

Summarizer

The Summarizer compresses older messages into a rolling summary using an LLM. It is constructed with a provider, token counter, and configuration:

summarizer := model.NewSummarizer(provider, counter, model.SummarizationConfig{
    Threshold:           0.8,
    PreserveRecentTurns: 5,
    MaxSummaryTokens:    1024,
    Prompt:             "",  // optional custom prompt
})

SummarizationConfig

Field	Type	Description	Default
`Threshold`	float64	Fraction of context window that triggers summarization	0.8
`PreserveRecentTurns`	int	Recent user/assistant pairs to keep unsummarized	5
`MaxSummaryTokens`	int	Cap on generated summary length	1024
`Prompt`	string	Custom summarization system prompt	built-in

NeedsSummarization

Check whether summarization is needed before calling Summarize:

if summarizer.NeedsSummarization(systemTokens, messages, contextLimit) {
    result, err := summarizer.Summarize(ctx, priorSummary, messages)
    // ...
}

Summarize

Summarize compresses older messages, preserves recent turns, and incorporates any prior summary:

result, err := summarizer.Summarize(ctx, priorSummary, messages)
if err != nil {
    return err
}
// result.Summary: new rolling summary
// result.PreservedMessages: recent messages kept intact

Rolling Summarization Flow

Detect overflow: When systemTokens + messageTokens > threshold * contextLimit, summarization is triggered.
Split messages: Old messages (to summarize) are separated from recent turns (to preserve).
Build input: Prior summary (if any) plus old messages are formatted for the summarizer.
Call LLM: The provider produces a concise summary.
Replace context: priorSummary becomes the new summary; messages becomes PreservedMessages.
Persist: Summary and preserved messages are stored in the event ledger.

ChatWithSession Integration

ChatWithSession automatically triggers summarization when approaching the context limit. No extra code is required:

Load session events from storage
Append the new user message
Build system context (prompt, instructions, memories, knowledge)
Resolve context limit (from ContextConfig.MaxContextTokens or model registry)
If NeedsSummarization is true: fire EventContextOverflow, run summarizer, persist, fire EventSummarization
Build final message array: system context + prior summary (if any) + preserved messages
Call the model

YAML Configuration

Context settings can be specified in agent YAML:

agents:
  - id: my-agent
    name: My Agent
    context:
      max_tokens: 0
      summarize_threshold: 0.8
      preserve_recent_turns: 5

Field	Description
`context.max_tokens`	Override model default; 0 = use model default
`context.summarize_threshold`	Fraction that triggers summarization
`context.preserve_recent_turns`	Recent turns to keep

Hook Events

When summarization occurs, hooks receive:

EventContextOverflow (before summarization): Metadata includes estimated_tokens and context_limit
EventSummarization (after summarization): Metadata includes summary_length and preserved_messages

a.AddHook(&MyHook{})

// In Before/After:
if evt.Type == hooks.EventContextOverflow {
    estimated := evt.Metadata["estimated_tokens"].(int)
    limit := evt.Metadata["context_limit"].(int)
    // log or alert
}
if evt.Type == hooks.EventSummarization {
    length := evt.Metadata["summary_length"].(int)
    preserved := evt.Metadata["preserved_messages"].(int)
}

ProviderConfig.ContextWindow

Model providers can override the default context window via ProviderConfig.ContextWindow. This is used when constructing providers (e.g., for custom deployments). The agent’s ContextConfig.MaxContextTokens takes precedence over the model registry; provider-level ContextWindow is an alternative way to set the limit when building the provider.

Complete Example

package main

import (
    "context"
    "fmt"
    "log"
    "os"

    "github.com/spawn08/chronos/engine/model"
    "github.com/spawn08/chronos/sdk/agent"
    "github.com/spawn08/chronos/storage/adapters/sqlite"
)

func main() {
    ctx := context.Background()

    store, err := sqlite.New("session.db")
    if err != nil {
        log.Fatal(err)
    }
    defer store.Close()
    if err := store.Migrate(ctx); err != nil {
        log.Fatal(err)
    }

    a, err := agent.New("session-agent", "Session Agent").
        WithModel(model.NewOpenAI(os.Getenv("OPENAI_API_KEY"))).
        WithStorage(store).
        WithSystemPrompt("You are a helpful assistant.").
        WithContextConfig(agent.ContextConfig{
            MaxContextTokens:    0,   // use model default
            SummarizeThreshold:  0.8,
            PreserveRecentTurns: 5,
        }).
        Build()
    if err != nil {
        log.Fatal(err)
    }

    sessionID := "user-123-conv-1"

    resp1, err := a.ChatWithSession(ctx, sessionID, "My name is Alice.")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(resp1.Content)

    resp2, err := a.ChatWithSession(ctx, sessionID, "What is my name?")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(resp2.Content)

    // After many turns, when context approaches the limit,
    // summarization runs automatically. No code changes needed.
}