InferCost documentation

InferCost is a Kubernetes-native controller that computes true cost-per-token for AI inference on your own hardware — from GPU amortization through electricity to per-request token economics.

These docs assume you have a Kubernetes cluster with GPU workloads and a rough idea of what your hardware cost. Everything else we can teach you.

Start here

Quickstart

5 minutes to your first cost report

Helm install, one CostProfile, and open Grafana.

CRD reference

CostProfile, UsageReport, TokenBudget

Every field, what it does, and why it's shaped that way.

FOCUS export

Drop into Kubecost / Cloudability

Standard FOCUS v1 columns plus x-Infer* extensions for on-prem dimensions.

Troubleshooting

Why is my dashboard flat?

The most common install failures, with the fix for each.

How the pieces fit

InferCost runs as one controller pod. It reads from infrastructure you already have (DCGM Exporter, your inference pods' /metrics endpoint), computes costs using a declared CostProfile, and writes results to several output channels depending on who needs them:

Platform engineers read the Prometheus metrics and the shipped Grafana dashboard.
Developers query the REST API or kubectl get usagereports.
Finance teams import the FOCUS-compatible CSV export into their existing FinOps pipeline.

There is no database to run, no UI to host, no SaaS to trust. Every number is computed from inputs you control, and every assumption is visible on disk.

Project status

InferCost is pre-v1. The Observe, Report, and Alert tiers are live; Enforce, Optimize, and Comply are in active development. Track progress on GitHub.

This is still early. If the docs aren't clear or the tool doesn't do what you expected, that is a bug — please open an issue. Anything marked "coming soon" in the roadmap has a corresponding issue you can add context to.