Records an individual LLM inference call that occurred within a running orchestration. Each inference record captures the model, provider, token counts, latency, cost, and content hashes — enabling fine-grained cost attribution and provenance auditing of every AI decision.
This endpoint is called by agent runtimes after each LLM API call completes. The inference record is linked to the parent orchestration and included in the final transparency-log receipt.
Authentication
API key with orchestrations:write scope. Alternatively, pass a Bearer JWT
token in the Authorization header.
Tenant identifier for multi-tenant isolation.
Path Parameters
Parent orchestration identifier (maip-orch:ULID). Must be in running
status.
Request
LLM model identifier (e.g. claude-sonnet-4-20250514, gpt-4o,
amazon.titan-text-express).
LLM provider. Accepted values: openai, anthropic, bedrock, custom.
Number of input tokens consumed by the inference call.
Number of output tokens generated by the inference call.
End-to-end latency of the inference call in milliseconds.
Cost of the inference call in US cents. Calculated by the agent runtime based
on provider pricing.
SHA-256 hex digest of the prompt input. Enables verification that the exact
input can be reproduced.
SHA-256 hex digest of the model output. Enables verification that the recorded
output matches what was returned.
Response
Unique identifier for this inference record.
Parent orchestration identifier.
Inference cost in US cents.
ISO 8601 timestamp when the inference was recorded.
API key for machine-to-machine authentication
Model identifier (e.g. gpt-4, claude-3)
Model parameters (temperature, max_tokens, etc.)
Inference result with receipt