Skip to content

Expose cache_control TTL as a configurable switch (currently hardcoded to 1h) #964

@DevSanatkumara

Description

@DevSanatkumara

Title

Expose cache_control.ttl as a configurable switch in ClaudeAgentOptions (currently hardcoded to 1h in bundled CLI)

Summary

The Anthropic API supports two ephemeral cache tiers — 5-minute (default per the API contract) and 1-hour — via cache_control: {type: "ephemeral", ttl: "5m" | "1h"}. The bundled Claude Code CLI (verified on v2.1.126) appears to send ttl: "1h" unconditionally for all cached prefixes, with no SDK-side way to override.

This request is not to add 1h support (it's already in use). It's to expose the choice as a configurable parameter on ClaudeAgentOptions, so SDK consumers can deliberately pick the TTL appropriate for their workload — and pay accordingly.

What we observed

Running a no-tool probe through claude-agent-sdk-python v0.1.72 / bundled CLI 2.1.126 against Haiku 4.5, the ResultMessage.usage payload returned:

{
  "input_tokens": 4,
  "output_tokens": 87,
  "cache_creation_input_tokens": 44053,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 0,
    "ephemeral_1h_input_tokens": 44053
  },
  "cache_read_input_tokens": 0
}

Every cache write went to the 1-hour bucket. We could not find any ClaudeAgentOptions parameter, env var, or CLI flag that influences this — it appears to be hardcoded inside the bundled CLI subprocess.

Why a switch matters even though 1h is the better default for our case

Different workloads have different economics:

  • Long-pause conversational agents (our case — Sanskrit philosophy Q&A bot): 1h tier saves money because natural conversation pauses exceed 5m. 1h cache write costs 2× the input rate vs 1.25× for 5m, but the avoided rewrite saves much more.
  • Fast pipelines, batch jobs, short-lived workers: 5m is cheaper because cache is reused within the window and the higher write premium of 1h tier is wasted.
  • Cost-sensitive prototyping: users may want to opt OUT of the 2× cache write premium to measure baseline cost.

Even consumers happy with the 1h default deserve to know they're paying for it. Currently it's invisible until you parse usage.cache_creation per turn (we've been doing this for cost-attribution observability) and notice the 5m bucket is always zero. That's surprising behavior for an SDK whose abstraction goal is to hide subprocess details.

Requested behavior

Any of these would solve the problem; ordered by preference:

Option A — ClaudeAgentOptions parameter (preferred)

options = ClaudeAgentOptions(
    model="claude-sonnet-4-6",
    cache_ttl="1h",   # accepts "5m" | "1h"; default = current behavior (probably "1h")
    # ...
)

Cleanest fit with the existing options surface; trivially testable. Default value can match current behavior so it's a non-breaking addition.

Option B — Env var passed to CLI subprocess

options = ClaudeAgentOptions(
    env={"ANTHROPIC_PROMPT_CACHE_TTL": "1h"},  # or "5m"
    # ...
)

Maps to a CLI-side environment variable that the bundled CLI honors when building the cache_control block. Works without changing the SDK's options TypedDict.

Option C — CLI flag (lower priority)

claude --cache-ttl 5m ... for direct CLI users; SDK could thread it through.

Suggestion regardless of which option lands

Document the current default behavior somewhere visible (SDK README, CLI --help, or release notes). Right now it's recoverable only by inspecting ResultMessage.usage.cache_creation. A one-line note ("Bundled CLI uses cache_control.ttl=1h by default for prefix caching") would prevent future cost-attribution head-scratching for other SDK consumers.

Why this matters for the SDK ecosystem

Multi-turn agents built on the SDK have wildly different cache-hit patterns depending on UX (long-pause Q&A vs fast back-and-forth). A fixed TTL choice optimizes for some and pessimizes for others. The Anthropic API surface already exposes the choice; the SDK is the right layer to forward it.

There's also a transparency angle. Cost observability tooling (we've built per-turn cost attribution на agent_turn_metrics) breaks subtly when the TTL is implicit — we ended up reverse-engineering the cache tier from the usage payload to explain a $0.15/day pricing gap between SDK self-reported cost and Anthropic dashboard. An explicit option would have made this trivially correct.

Workaround considered and rejected

Bypassing the bundled CLI and calling the Anthropic API directly to set cache_control.ttl explicitly. Rejected because it would undo the SDK's built-in advantages (MCP server integration via stdio, agent loop, tool use handling, structured streaming). The SDK is the right layer for this — we just need it to forward the TTL parameter.

Willing to contribute

Happy to file a PR if the SDK team confirms the desired API surface (Option A / B / C above) and that this is welcome. The change is small — primarily plumbing one optional parameter from ClaudeAgentOptions through to the CLI subprocess invocation, then from the CLI into the cache_control.ttl field.

Versions

  • claude-agent-sdk-python: 0.1.72
  • Bundled Claude Code CLI: 2.1.126
  • Python: 3.12
  • Production model: claude-sonnet-4-6

Thanks for considering this — it would improve both production economics (for workloads where 5m is the better fit) and cost-attribution clarity (for everyone) on the SDK.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions