Most teams do not need to wait for SDK wrappers to get serious cost visibility. You can ship useful LLM cost spike detection now with a direct ingest contract and a safe async sender. This post shows a practical setup that gives you: endpoint-level cost attribution tenant/user concentration views prompt deploy regression detection budget and spend-alert workflows without changing provider traffic paths. It does not mean "manual forever". It means: Keep provider calls as-is. Extract usage metadata from provider response. Send a normalized telemetry payload asynchronously. SDK wrappers later can reduce boilerplate, but they are not required for production value. Map provider-specific usage fields into a normalized model. Send telemetry with timeout + swallow so user request path is never blocked. Query by endpoint, user/tenant, and promptVersion to explain spikes. { "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY", "provider": "openai", "model": "gpt-4o-mini", "endpointTag": "chat_summary", "promptVersion": "summary_v3", "userId": "tenant_acme_hash", "inputTokens": 1420, "outputTokens": 518, "latencyMs": 892, "status": "success", "dataMode": "real", "environment": "prod" }

Required for reliable diagnosis: externalRequestId (stable on retries) provider, model, endpointTag, promptVersion

token counts + latency + status Recommended: userId (hash if needed) dataMode and environment

Safe sender pattern (TypeScript)

type TelemetryPayload = { externalRequestId: string; provider: string; model: string; endpointTag: string; promptVersion: string; userId?: string; inputTokens: number; outputTokens: number; latencyMs: number; status: 'success' | 'error'; dataMode: 'real' | 'test' | 'demo'; environment: 'prod' | 'staging' | 'dev'; };

async function sendTelemetrySafe(payload: TelemetryPayload): Promise<void> { const controller = new AbortController(); const timeout = setTimeout(() => controller.abort()