CRD reference
InferCost is entirely configured through three Custom Resource Definitions under the finops.infercost.ai API group. No database, no config file, no admin API — what you
see in kubectl get is the source of truth.
CostProfile
Declares the hardware economics for a node or GPU pool. The controller uses this to compute hourly cost and normalizes all token-level attribution against it.
apiVersion: finops.infercost.ai/v1alpha1
kind: CostProfile
metadata:
name: h100-node-01
spec:
hardware:
gpuModel: "NVIDIA H100 SXM5" # free-form; shown in dashboards
gpuCount: 8
purchasePriceUSD: 280000
amortizationYears: 3 # standard enterprise is 3; prosumer is 4
maintenancePercentPerYear: 0.10 # 10% annual support contract
tdpWatts: 700 # fallback when DCGM unreachable
electricity:
ratePerKWh: 0.12
pueFactor: 1.4 # 1.0 for homelabs, 1.2-1.6 for data centers
nodeSelector:
kubernetes.io/hostname: h100-node-01 Status fields
hourlyCostUSD— amortization + electricity at current power drawamortizationRatePerHour— hardware cost componentelectricityCostPerHour— energy cost at current powercurrentPowerDrawWatts— real-time total from DCGMconditions[type=Ready]—True/CostComputedwhen the numbers are freshconditions[type=DCGMReachable]— one of four states (see DCGM setup) that tells you exactly how the power number was obtained
UsageReport
Auto-populated cost report over a time period. The controller scrapes inference pods, attributes tokens to models and namespaces, and writes the result back to status.
apiVersion: finops.infercost.ai/v1alpha1
kind: UsageReport
metadata:
name: engineering-daily
namespace: engineering
spec:
costProfileRef: h100-node-01 # same namespace as this report
schedule: daily # daily | weekly | monthly
namespaces: # optional; if empty, all namespaces
- engineering
- research Status fields
period,periodStart,periodEndinputTokens,outputTokensestimatedCostUSD,costPerMillionTokensbyModel[]— per-model breakdown (model,namespace, token counts, cost)byNamespace[]— per-team breakdown suitable for chargebackcloudComparison[]— per-provider equivalent cost + savings vs on-premgpuEfficiencyRatio— fraction of GPU time on active inference
TokenBudget
Per-namespace spend limit with alert thresholds. Fires Prometheus alerts through the existing Alertmanager (no InferCost-specific alert delivery).
apiVersion: finops.infercost.ai/v1alpha1
kind: TokenBudget
metadata:
name: engineering-monthly
namespace: engineering
spec:
scope:
namespace: engineering
monthlyLimitUSD: 500
alertThresholds:
- percent: 80
severity: warning
- percent: 100
severity: critical The controller generates a PrometheusRule from each TokenBudget; alerts route through
whatever delivery you already have configured (Slack, PagerDuty, email).
Why CRDs, not a config API
Because InferCost needs to behave like everything else in your cluster — GitOps-compatible,
RBAC-scoped, auditable via the same kubectl describe your team already uses. A
vendor-specific config surface is one more thing to secure, back up, and explain to new hires;
CRDs inherit every Kubernetes property for free.