Expose cache_control TTL as a configurable switch (currently hardcoded to 1h)

## Title

**Expose `cache_control.ttl` as a configurable switch in `ClaudeAgentOptions` (currently hardcoded to 1h in bundled CLI)**

## Summary

The Anthropic API supports two ephemeral cache tiers — 5-minute (default per the API contract) and 1-hour — via `cache_control: {type: "ephemeral", ttl: "5m" | "1h"}`. The bundled Claude Code CLI (verified on v2.1.126) appears to send `ttl: "1h"` unconditionally for all cached prefixes, with no SDK-side way to override.

This request is **not** to add 1h support (it's already in use). It's to **expose the choice** as a configurable parameter on `ClaudeAgentOptions`, so SDK consumers can deliberately pick the TTL appropriate for their workload — and pay accordingly.

## What we observed

Running a no-tool probe through `claude-agent-sdk-python` v0.1.72 / bundled CLI 2.1.126 against Haiku 4.5, the `ResultMessage.usage` payload returned:

```json
{
  "input_tokens": 4,
  "output_tokens": 87,
  "cache_creation_input_tokens": 44053,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 0,
    "ephemeral_1h_input_tokens": 44053
  },
  "cache_read_input_tokens": 0
}
```

Every cache write went to the 1-hour bucket. We could not find any `ClaudeAgentOptions` parameter, env var, or CLI flag that influences this — it appears to be hardcoded inside the bundled CLI subprocess.

## Why a switch matters even though 1h is the better default for our case

Different workloads have different economics:

- **Long-pause conversational agents** (our case — Sanskrit philosophy Q&A bot): 1h tier saves money because natural conversation pauses exceed 5m. 1h cache write costs 2× the input rate vs 1.25× for 5m, but the avoided rewrite saves much more.
- **Fast pipelines, batch jobs, short-lived workers**: 5m is cheaper because cache is reused within the window and the higher write premium of 1h tier is wasted.
- **Cost-sensitive prototyping**: users may want to opt OUT of the 2× cache write premium to measure baseline cost.

Even consumers happy with the 1h default deserve to **know** they're paying for it. Currently it's invisible until you parse `usage.cache_creation` per turn (we've been doing this for cost-attribution observability) and notice the 5m bucket is always zero. That's surprising behavior for an SDK whose abstraction goal is to hide subprocess details.

## Requested behavior

Any of these would solve the problem; ordered by preference:

### Option A — `ClaudeAgentOptions` parameter (preferred)

```python
options = ClaudeAgentOptions(
    model="claude-sonnet-4-6",
    cache_ttl="1h",   # accepts "5m" | "1h"; default = current behavior (probably "1h")
    # ...
)
```

Cleanest fit with the existing options surface; trivially testable. Default value can match current behavior so it's a non-breaking addition.

### Option B — Env var passed to CLI subprocess

```python
options = ClaudeAgentOptions(
    env={"ANTHROPIC_PROMPT_CACHE_TTL": "1h"},  # or "5m"
    # ...
)
```

Maps to a CLI-side environment variable that the bundled CLI honors when building the `cache_control` block. Works without changing the SDK's options TypedDict.

### Option C — CLI flag (lower priority)

`claude --cache-ttl 5m ...` for direct CLI users; SDK could thread it through.

## Suggestion regardless of which option lands

Document the current default behavior somewhere visible (SDK README, CLI `--help`, or release notes). Right now it's recoverable only by inspecting `ResultMessage.usage.cache_creation`. A one-line note ("Bundled CLI uses `cache_control.ttl=1h` by default for prefix caching") would prevent future cost-attribution head-scratching for other SDK consumers.

## Why this matters for the SDK ecosystem

Multi-turn agents built on the SDK have wildly different cache-hit patterns depending on UX (long-pause Q&A vs fast back-and-forth). A fixed TTL choice optimizes for some and pessimizes for others. The Anthropic API surface already exposes the choice; the SDK is the right layer to forward it.

There's also a transparency angle. Cost observability tooling (we've built per-turn cost attribution на `agent_turn_metrics`) breaks subtly when the TTL is implicit — we ended up reverse-engineering the cache tier from the usage payload to explain a $0.15/day pricing gap between SDK self-reported cost and Anthropic dashboard. An explicit option would have made this trivially correct.

## Workaround considered and rejected

Bypassing the bundled CLI and calling the Anthropic API directly to set `cache_control.ttl` explicitly. Rejected because it would undo the SDK's built-in advantages (MCP server integration via stdio, agent loop, tool use handling, structured streaming). The SDK is the right layer for this — we just need it to forward the TTL parameter.

## Willing to contribute

Happy to file a PR if the SDK team confirms the desired API surface (Option A / B / C above) and that this is welcome. The change is small — primarily plumbing one optional parameter from `ClaudeAgentOptions` through to the CLI subprocess invocation, then from the CLI into the `cache_control.ttl` field.

## Versions

- `claude-agent-sdk-python`: 0.1.72
- Bundled Claude Code CLI: 2.1.126
- Python: 3.12
- Production model: claude-sonnet-4-6

Thanks for considering this — it would improve both production economics (for workloads where 5m is the better fit) and cost-attribution clarity (for everyone) on the SDK.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose cache_control TTL as a configurable switch (currently hardcoded to 1h) #964

Title

Summary

What we observed

Why a switch matters even though 1h is the better default for our case

Requested behavior

Option A — `ClaudeAgentOptions` parameter (preferred)

Option B — Env var passed to CLI subprocess

Option C — CLI flag (lower priority)

Suggestion regardless of which option lands

Why this matters for the SDK ecosystem

Workaround considered and rejected

Willing to contribute

Versions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Expose cache_control TTL as a configurable switch (currently hardcoded to 1h) #964

Description

Title

Summary

What we observed

Why a switch matters even though 1h is the better default for our case

Requested behavior

Option A — ClaudeAgentOptions parameter (preferred)

Option B — Env var passed to CLI subprocess

Option C — CLI flag (lower priority)

Suggestion regardless of which option lands

Why this matters for the SDK ecosystem

Workaround considered and rejected

Willing to contribute

Versions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Option A — `ClaudeAgentOptions` parameter (preferred)