No-SDK LLM Cost Spike Detection in Production (Endpoint + User + PromptVersion)
Most teams do not need to wait for SDK wrappers to get serious cost visibility. You can ship useful LLM cost spike detection now with a direct ingest contract and a safe async sender. This post shows
Most teams do not need to wait for SDK wrappers to get serious cost visibility. You can ship useful LLM cost spike detection now with a direct ingest contract and a safe async sender. This post shows a practical setup that gives you: endpoint-level cost attribution tenant/user concentration views prompt deploy regression detection budget and spend-alert workflows without changing provider traffic paths. It does not mean "manual forever". It means: Keep provider calls as-is. Extract usage metadata from provider response. Send a normalized telemetry payload asynchronously. SDK wrappers later can reduce boilerplate, but they are not required for production value. Map provider-specific usage fields into a normalized model. Send telemetry with timeout + swallow so user request path is never blocked. Query by endpoint, user/tenant, and promptVersion to explain spikes. { "externalRequestId": "req_01HZXB6MQZ2WQ9D2KCF9M4V2QY", "provider": "openai", "model": "gpt-4o-mini", "endpointTag": "chat_summary", "promptVersion": "summary_v3", "userId": "tenant_acme_hash", "inputTokens": 1420, "outputTokens": 518, "latencyMs": 892, "status": "success", "dataMode": "real", "environment": "prod" }
Required for reliable diagnosis: externalRequestId (stable on retries) provider, model, endpointTag, promptVersion
token counts + latency + status Recommended: userId (hash if needed) dataMode and environment
Safe sender pattern (TypeScript)
type TelemetryPayload = { externalRequestId: string; provider: string; model: string; endpointTag: string; promptVersion: string; userId?: string; inputTokens: number; outputTokens: number; latencyMs: number; status: 'success' | 'error'; dataMode: 'real' | 'test' | 'demo'; environment: 'prod' | 'staging' | 'dev'; };
async function sendTelemetrySafe(payload: TelemetryPayload): Promise<void> { const controller = new AbortController(); const timeout = setTimeout(() => controller.abort()