AI Inference Metrics

Avg Latency (p50)

—

ms, visible models

p95 Latency

—

ms, visible models

Avg TTFT (p50)

—

ms, visible models

Invocations

—

total in range

Output Tokens/s

—

avg generation speed

Est. Cost

—

USD in range

Error Rate

—

errors / invocations

Invocation Latency

p50 / p95 / p99 per model — milliseconds

📭

No latency data for this time range

Time to First Token (TTFT)

p50 / p95 — milliseconds

📭

No TTFT data for this time range

Invocations / period

Request volume by model

📭

No invocation data for this time range

Token Throughput

Input + output tokens per period (visible models)

📭

No token data for this time range

Prompt Cache

Cache reads vs. writes — tokens per period

📭

No cache data for this time range

Errors

Client errors · Server errors

✅

No errors in this time range

Quota Utilization

% of TPM quota used per model

📭

No quota data for this time range

Estimated Cost

USD per period (on-demand estimate)

📭

No cost data for this time range