CRD reference

InferCost is entirely configured through three Custom Resource Definitions under the finops.infercost.ai API group. No database, no config file, no admin API — what you see in kubectl get is the source of truth.

CostProfile

Declares the hardware economics for a node or GPU pool. The controller uses this to compute hourly cost and normalizes all token-level attribution against it.

apiVersion: finops.infercost.ai/v1alpha1
kind: CostProfile
metadata:
  name: h100-node-01
spec:
  hardware:
    gpuModel: "NVIDIA H100 SXM5"      # free-form; shown in dashboards
    gpuCount: 8
    purchasePriceUSD: 280000
    amortizationYears: 3              # standard enterprise is 3; prosumer is 4
    maintenancePercentPerYear: 0.10   # 10% annual support contract
    tdpWatts: 700                     # fallback when DCGM unreachable
  electricity:
    ratePerKWh: 0.12
    pueFactor: 1.4                    # 1.0 for homelabs, 1.2-1.6 for data centers
  nodeSelector:
    kubernetes.io/hostname: h100-node-01

Status fields

  • hourlyCostUSD — amortization + electricity at current power draw
  • amortizationRatePerHour — hardware cost component
  • electricityCostPerHour — energy cost at current power
  • currentPowerDrawWatts — real-time total from DCGM
  • conditions[type=Ready]True/CostComputed when the numbers are fresh
  • conditions[type=DCGMReachable] — one of four states (see DCGM setup) that tells you exactly how the power number was obtained

UsageReport

Auto-populated cost report over a time period. The controller scrapes inference pods, attributes tokens to models and namespaces, and writes the result back to status.

apiVersion: finops.infercost.ai/v1alpha1
kind: UsageReport
metadata:
  name: engineering-daily
  namespace: engineering
spec:
  costProfileRef: h100-node-01        # same namespace as this report
  schedule: daily                     # daily | weekly | monthly
  namespaces:                         # optional; if empty, all namespaces
    - engineering
    - research

Status fields

  • period, periodStart, periodEnd
  • inputTokens, outputTokens
  • estimatedCostUSD, costPerMillionTokens
  • byModel[] — per-model breakdown (model, namespace, token counts, cost)
  • byNamespace[] — per-team breakdown suitable for chargeback
  • cloudComparison[] — per-provider equivalent cost + savings vs on-prem
  • gpuEfficiencyRatio — fraction of GPU time on active inference

TokenBudget

Per-namespace spend limit with alert thresholds. Fires Prometheus alerts through the existing Alertmanager (no InferCost-specific alert delivery).

apiVersion: finops.infercost.ai/v1alpha1
kind: TokenBudget
metadata:
  name: engineering-monthly
  namespace: engineering
spec:
  scope:
    namespace: engineering
  monthlyLimitUSD: 500
  alertThresholds:
    - percent: 80
      severity: warning
    - percent: 100
      severity: critical

The controller generates a PrometheusRule from each TokenBudget; alerts route through whatever delivery you already have configured (Slack, PagerDuty, email).

Why CRDs, not a config API

Because InferCost needs to behave like everything else in your cluster — GitOps-compatible, RBAC-scoped, auditable via the same kubectl describe your team already uses. A vendor-specific config surface is one more thing to secure, back up, and explain to new hires; CRDs inherit every Kubernetes property for free.