AI Inference Metrics

Loading…
Loading
Avg Latency (p50)
ms, visible models
p95 Latency
ms, visible models
Avg TTFT (p50)
ms, visible models
Invocations
total in range
Output Tokens/s
avg generation speed
Est. Cost
USD in range
Error Rate
errors / invocations

Invocation Latency

p50 / p95 / p99 per model — milliseconds
📭
No latency data for this time range

Time to First Token (TTFT)

p50 / p95 — milliseconds
📭
No TTFT data for this time range

Invocations / period

Request volume by model
📭
No invocation data for this time range

Token Throughput

Input + output tokens per period (visible models)
📭
No token data for this time range

Prompt Cache

Cache reads vs. writes — tokens per period
📭
No cache data for this time range

Errors

Client errors · Server errors
No errors in this time range

Quota Utilization

% of TPM quota used per model
📭
No quota data for this time range

Estimated Cost

USD per period (on-demand estimate)
📭
No cost data for this time range