Kubernetes-native AI FinOps

Know the true cost of AI inference on your hardware

Cloud FinOps tools see GPUs but not tokens. LLM gateways see tokens but assume on-prem is free. InferCost computes what nobody else does: real cost-per-token from hardware amortization, electricity, and actual GPU power draw.

Get started in 5 minutes Read the docs View on GitHub

Or subscribe for updates.

infercost / shadowstack-rtx5060ti

live

Hourly Cost

$0.053

amort + electricity

Monthly

$38

projected at current rate

Cost/MTok

$0.41

under active load

GPU Power

84.7W

2x RTX 5060 Ti

Savings vs Cloud APIs

94%

Claude Opus 4.6

89%

GPT-5.4

90%

Claude Sonnet 4.6

84%

Gemini 2.5 Pro

69%

Claude Haiku 4.5

n/a

Gemini Flash-Lite

cloud cheaper

Illustrative dashboard. Your numbers vary with utilization, electricity rate, and GPU model.

The problem

Nobody computes the true cost of on-prem inference

Organizations are making million-dollar hardware decisions with zero visibility into true unit economics. The FinOps Foundation's own working group explicitly acknowledges that on-premises AI cost is out of scope.

Capability	OpenCost	Kubecost	LiteLLM	Langfuse	InferCost
Token-level tracking	—	—	✓	✓	✓
Per-user attribution	—	—	✓	✓	✓
On-prem hardware cost	—	—	—	—	✓
GPU amortization	—	—	—	—	✓
Electricity + PUE	—	—	—	—	✓
Cloud comparison	—	—	—	—	✓
Kubernetes-native	✓	✓	—	—	✓
Open source	✓	—	✓	✓	✓

How it works

Three steps. Five minutes.

No database. No UI to host. One controller pod that plugs into infrastructure you already run.

Declare your hardware

# costprofile.yaml

spec:

hardware:

gpuModel: "RTX 5060 Ti"

gpuCount: 2

purchasePriceUSD: 960

amortizationYears: 3

electricity:

ratePerKWh: 0.08

Deploy the operator

$ helm install infercost \

infercost/infercost \

--set dcgm.endpoint=auto

✓ CRDs installed

✓ Controller running

✓ Metrics flowing

See your true costs

$ infercost status

INFRASTRUCTURE COSTS

shadowstack RTX 5060 Ti 2 $0.053/hr

SAVINGS vs CLOUD

Opus 4.6 $9.20 saved (94%)

GPT-5.4 $5.22 saved (89%)

Flash-Lite cloud 280% cheaper

Architecture

One pod. Zero dependencies.

InferCost plugs into your existing Prometheus and Grafana stack. No new databases, no UI to host, no infrastructure to manage.

Data Sources

DCGM Exporter GPU power draw (watts)

llama.cpp token counts per pod

CostProfile CRD hardware economics

LiteLLM PG per-user (optional)

InferCost Controller

Single Pod

GPU Power Scraper

Token Counter

Cost Calculator

Attribution Engine

Cloud Comparator

Report Writer

Outputs

Prometheus metrics any monitoring tool

REST API custom integrations

Grafana Dashboard pre-built, ships as JSON

UsageReport CRDs kubectl, GitOps

Roadmap

From visibility to enforcement

Each tier builds on the last. Start with cost visibility, grow into budget enforcement and optimization.

Observe

Live

Cost-per-token, GPU power, efficiency

Report

Live

Per-team, per-model, cloud comparison

Alert

Coming Soon

Budget thresholds, anomaly detection

Enforce

Coming Soon

Rate-limit over-budget teams

Optimize

Planned

Model switching, scale-down scheduling

Comply

Planned

Audit export, EU AI Act, FOCUS spec

OpenTelemetry GenAI

Metric conventions

FOCUS Spec

Compatible export

OpenCost

Complementary

Apache 2.0

Open source

Ready to try it?

Install on any Kubernetes cluster in five minutes.

Quickstart Browse the docs

Or subscribe for release announcements: