Kubernetes-native AI FinOps
Cloud FinOps tools see GPUs but not tokens. LLM gateways see tokens but assume on-prem is free. InferCost computes what nobody else does: real cost-per-token from hardware amortization, electricity, and actual GPU power draw.
Hourly Cost
$0.053
amort + electricity
Monthly
$38
projected at current rate
Cost/MTok
$0.41
under active load
GPU Power
84.7W
2x RTX 5060 Ti
Savings vs Cloud APIs
94%
Claude Opus 4.6
89%
GPT-5.4
90%
Claude Sonnet 4.6
84%
Gemini 2.5 Pro
69%
Claude Haiku 4.5
n/a
Gemini Flash-Lite
cloud cheaper
Real data from a homelab running Qwen3-32B on 2x RTX 5060 Ti
The problem
Organizations are making million-dollar hardware decisions with zero visibility into true unit economics. The FinOps Foundation's own working group explicitly acknowledges that on-premises AI cost is out of scope.
| Capability | OpenCost | Kubecost | LiteLLM | Langfuse | InferCost |
|---|---|---|---|---|---|
| Token-level tracking | — | — | ✓ | ✓ | ✓ |
| Per-user attribution | — | — | ✓ | ✓ | ✓ |
| On-prem hardware cost | — | — | — | — | ✓ |
| GPU amortization | — | — | — | — | ✓ |
| Electricity + PUE | — | — | — | — | ✓ |
| Cloud comparison | — | — | — | — | ✓ |
| Kubernetes-native | ✓ | ✓ | — | — | ✓ |
| Open source | ✓ | — | ✓ | ✓ | ✓ |
How it works
No database. No UI to host. One controller pod that plugs into infrastructure you already run.
# costprofile.yaml
spec:
hardware:
gpuModel: "RTX 5060 Ti"
gpuCount: 2
purchasePriceUSD: 960
amortizationYears: 3
electricity:
ratePerKWh: 0.08
$ helm install infercost \
infercost/infercost \
--set dcgm.endpoint=auto
✓ CRDs installed
✓ Controller running
✓ Metrics flowing
$ infercost status
INFRASTRUCTURE COSTS
shadowstack RTX 5060 Ti 2 $0.053/hr
SAVINGS vs CLOUD
Opus 4.6 $9.20 saved (94%)
GPT-5.4 $5.22 saved (89%)
Flash-Lite cloud 280% cheaper
Architecture
InferCost plugs into your existing Prometheus and Grafana stack. No new databases, no UI to host, no infrastructure to manage.
Data Sources
InferCost Controller
Single Pod
GPU Power Scraper
Token Counter
Cost Calculator
Attribution Engine
Cloud Comparator
Report Writer
Outputs
Roadmap
Each tier builds on the last. Start with cost visibility, grow into budget enforcement and optimization.
Cost-per-token, GPU power, efficiency
Per-team, per-model, cloud comparison
Budget thresholds, anomaly detection
Rate-limit over-budget teams
Model switching, scale-down scheduling
Audit export, EU AI Act, FOCUS spec
OpenTelemetry GenAI
Metric conventions
FOCUS Spec
Compatible export
OpenCost
Complementary
Apache 2.0
Open source
InferCost is in active development. Join the list to be notified when we launch.