Bundled CLI silently drops inbound TRACEPARENT on 2nd+ query() call when ~/.claude/ has state from a prior run

## Summary

When the bundled Claude Code CLI is invoked from `claude-agent-sdk-python`'s `query()` with a parent W3C trace context in env (`TRACEPARENT`), it correctly nests its `claude_code.*` spans under the caller's trace **only on the first invocation in the process's lifetime**. On the second and subsequent invocations in the same long-running Python process, the same valid `TRACEPARENT` is silently ignored — `claude_code.interaction` / `claude_code.llm_request` / `claude_code.tool` spans each emit with their own fresh trace IDs and no parent.

The trigger is the persistent state directory at `~/.claude/` (specifically `~/.claude.json`, created by the CLI on first run with a `firstStartTime` marker). Wiping that directory between calls restores correct nesting; leaving it reproduces the bug 100% of the time.

## Environment

- `claude-agent-sdk-python` 0.1.x (Python 3.13)
- Bundled CLI as shipped with the above version
- Linux x86_64 container
- Telemetry envs at process level: `CLAUDE_CODE_ENABLE_TELEMETRY=1`, `CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1`, `OTEL_TRACES_EXPORTER=otlp`, `OTEL_EXPORTER_OTLP_ENDPOINT=...`, etc.
- Backend: Langfuse Cloud (but the issue is shape, not destination — it would manifest the same on any OTLP collector)

## Reproducer

\`\`\`python
import asyncio
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

from claude_agent_sdk import query, ClaudeAgentOptions

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter())
)
tracer = trace.get_tracer(\"repro\")

async def one_call(label: str):
    with tracer.start_as_current_span(label):
        async for _ in query(prompt=\"echo hello\", options=ClaudeAgentOptions()):
            pass

asyncio.run(one_call(\"call-1\"))   # CLI spans nest under call-1 ✓
asyncio.run(one_call(\"call-2\"))   # CLI spans become orphan roots ✗
\`\`\`

Tested both with the SDK's auto-injection and with explicit \`TRACEPARENT\` set in \`ClaudeAgentOptions.env\` — same behavior, ruling out auto-injection as the cause.

## What I observed

For call-1: the \`claude_code.interaction\` span correctly carries the parent's trace_id from \`TRACEPARENT\`, and its \`llm_request\` / \`tool\` children inherit it. The whole interaction is one trace rooted at the caller's span.

For call-2 onward: each \`claude_code.llm_request\` and \`claude_code.tool\` emits with its own freshly-generated trace_id. There is **no \`claude_code.interaction\` span at all** in the second call's emission — only the children, each becoming its own trace root. So even the CLI's *internal* context propagation (interaction → its children) appears affected, not just the inbound TRACEPARENT.

I verified the Python side is byte-identical between calls 1 and 2: same \`TracerProvider\`, same \`CompositePropagator\`, valid sampled \`TRACEPARENT\` string with correct trace_id matching the active Python span, correctly landing in \`options.env\` and \`process_env\` for the spawned subprocess. The bug is on the CLI side.

## What unblocks the bug

Wiping \`~/.claude/\` between calls makes the next call nest correctly. I confirmed this experimentally — clearing the dir, running call-A (works like a first call), running call-B without clearing (breaks again). The pattern is fully reproducible.

The most likely culprit is something in \`~/.claude.json\` — specifically the \`firstStartTime\` marker or one of the migration flags — that the CLI checks and uses to skip some OTel init or interaction-span construction step it does only on a \"true first run.\"

## Workaround

Override \`HOME\` to a unique throwaway path per \`query()\` call (e.g. \`HOME=/tmp/agent-cli-<uuid>\` in \`ClaudeAgentOptions.env\`). The CLI then always thinks it's running for the first time and always honors \`TRACEPARENT\`. Cost: ~20KB per call accumulated in \`/tmp\` until container restart. Functional behavior unchanged because I don't use \`--continue\` / \`--resume\` / \`--session-id\`.

## Impact

For anyone using the SDK in a long-running server / worker process that emits multiple \`query()\` calls (the dominant deployment shape per the [Hosting the Agent SDK](https://code.claude.com/docs/en/agent-sdk/hosting) docs), every call after the first produces fragmented telemetry — making the [Read agent traces](https://code.claude.com/docs/en/agent-sdk/observability#read-agent-traces) flow described in the observability docs unusable past the first call without the workaround above.

## Asks

- Confirm whether the CLI is supposed to re-read \`TRACEPARENT\` and re-establish parent context on every subprocess invocation regardless of \`~/.claude/\` state
- If yes, the regression is in whatever code path differs between \"first start\" and \"subsequent start\" — likely in the OTel SDK init or the interaction-span construction
- Happy to test a fix or provide more diagnostic data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bundled CLI silently drops inbound TRACEPARENT on 2nd+ query() call when ~/.claude/ has state from a prior run #952

Summary

Environment

Reproducer

What I observed

What unblocks the bug

Workaround

Impact

Asks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bundled CLI silently drops inbound TRACEPARENT on 2nd+ query() call when ~/.claude/ has state from a prior run #952

Description

Summary

Environment

Reproducer

What I observed

What unblocks the bug

Workaround

Impact

Asks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions