InferCost documentation
InferCost is a Kubernetes-native controller that computes true cost-per-token for AI inference on your own hardware — from GPU amortization through electricity to per-request token economics.
These docs assume you have a Kubernetes cluster with GPU workloads and a rough idea of what your hardware cost. Everything else we can teach you.
Start here
x-Infer* extensions for on-prem dimensions.How the pieces fit
InferCost runs as one controller pod. It reads from infrastructure you already have (DCGM
Exporter, your inference pods' /metrics endpoint), computes costs using a declared CostProfile, and writes results to several output channels depending on who needs
them:
- Platform engineers read the Prometheus metrics and the shipped Grafana dashboard.
- Developers query the REST API or
kubectl get usagereports. - Finance teams import the FOCUS-compatible CSV export into their existing FinOps pipeline.
There is no database to run, no UI to host, no SaaS to trust. Every number is computed from inputs you control, and every assumption is visible on disk.
Project status
InferCost is pre-v1. The Observe, Report, and Alert tiers are live; Enforce, Optimize, and Comply are in active development. Track progress on GitHub.
This is still early. If the docs aren't clear or the tool doesn't do what you expected, that is a bug — please open an issue. Anything marked "coming soon" in the roadmap has a corresponding issue you can add context to.