Quickstart
Five minutes, one Helm command, one YAML file. At the end you'll have live per-token cost data for every inference pod in your cluster.
Prerequisites
- A Kubernetes cluster with at least one node running GPU inference workloads
- DCGM Exporter installed (or see DCGM setup for a five-minute install)
- Inference pods exposing Prometheus metrics on a known port (llama.cpp or vLLM)
helmandkubectlconfigured against your cluster
1. Install the Helm chart
helm repo add infercost https://defilantech.github.io/infercost
helm repo update
helm install infercost infercost/infercost \
--namespace infercost-system \
--create-namespace On a cluster running kube-prometheus-stack, the chart automatically creates a PodMonitor so Prometheus discovers InferCost without extra config. Without prometheus-operator the chart skips that resource silently.
2. Declare your hardware
One CostProfile per node or GPU pool. The sample library ships ready-to-use manifests for H100, A100, L40S, RTX 4090/5090/5060 Ti, A6000, and Apple M2 Ultra
— copy the one closest to your setup and tweak the purchasePriceUSD.
# costprofile.yaml — using a 2x RTX 5060 Ti lab setup as the example
apiVersion: finops.infercost.ai/v1alpha1
kind: CostProfile
metadata:
name: my-gpu-node
spec:
hardware:
gpuModel: "NVIDIA GeForce RTX 5060 Ti"
gpuCount: 2
purchasePriceUSD: 960 # what you actually paid
amortizationYears: 4
tdpWatts: 180 # per-GPU fallback when DCGM unreachable
electricity:
ratePerKWh: 0.08
pueFactor: 1.0
nodeSelector:
kubernetes.io/hostname: gpu-node-01 kubectl apply -f costprofile.yaml 3. Tell InferCost where DCGM is
If DCGM is already discoverable at the GKE/GPU-operator default location, nothing to do. Otherwise point the controller at it via Helm values:
helm upgrade infercost infercost/infercost \
--namespace infercost-system \
--set dcgm.endpoint=http://nvidia-dcgm-exporter.gpu-operator-resources.svc:9400/metrics When DCGM is missing or unreachable, CostProfile.status.conditions surfaces a DCGMReachable=False condition with an actionable message — so you know exactly why a
dashboard is flat instead of guessing. See DCGM setup for all four diagnostic states.
4. See your costs
$ kubectl get costprofiles
NAME GPU MODEL GPUs $/HR POWER (W) AGE
my-gpu-node NVIDIA GeForce RTX 5060 Ti 2 $0.0531 86W 2m
$ kubectl get usagereports
NAME PERIOD COST ($) $/MTOK INPUT TOKENS OUTPUT TOKENS AGE
daily-rollup 2026-04-20 $0.8941 $0.4192 1,200,000 800,000 5m Or from the CLI:
$ infercost status
INFRASTRUCTURE COSTS
PROFILE GPU MODEL GPUs $/HOUR POWER AGE
my-gpu-node NVIDIA GeForce RTX 5060 Ti 2 $0.0531 86W 2m
$ infercost compare --namespace engineering
PROVIDER MODEL CLOUD COST SAVINGS
Anthropic claude-sonnet-4-6 $15.60 $14.70 (94%)
OpenAI gpt-5.4 $15.00 $14.10 (94%) 5. Open the Grafana dashboard
kubectl create configmap infercost-dashboard \
--from-file=https://raw.githubusercontent.com/defilantech/infercost/main/config/grafana/infercost-dashboard.json \
-n monitoring
kubectl label configmap infercost-dashboard grafana_dashboard=1 -n monitoring The Grafana sidecar auto-loads the dashboard within a minute. You will see cost-per-token, hourly cost, GPU efficiency, and the cloud-equivalent comparison panels populate as soon as Prometheus scrapes its first data point.
Next steps
- FOCUS export: generate a CSV your finance team can import
- CRD reference: add
TokenBudgetfor per-namespace alerting - Cloud pricing: override list prices with your negotiated enterprise rates
- Troubleshooting: what to check when something doesn't appear