English
This talk will be held in English. / Dieser Vortrag wird auf Englisch gehalten.
Christopher Haar
is a Software Engineer at Upbound working on open source. He's a maintainer of Crossplane and several providers in its ecosystem, and a regular contributor across the community. He brings years of experience building enterprise infrastructure across Telco, Railway, Finance, and Cloud.
Johannes Koch
is a DevOps enthusiasts, a builder by heart and an AWS DevTools Hero since June 2023. He loves building things using Typescript, Angular, Java- or Go. He believes in full CI/CD automation and everything required to run the solution being part of the infrastructure as code – here he has learned to love AWS CDK. He leads the AWS UG Bergstrasse and is a member of the AWS DACH Community.
This talk will be held in English. / Dieser Vortrag wird auf Englisch gehalten.
Self-hosting or API? Every team running LLMs seriously asks this question. And almost every comparison you find gets it wrong. Not because the numbers are wrong, but because the wrong things are being compared.
An H100 cluster does not cost like a pay-per-token API. The cost structure is different, the scaling is different, and the break-even depends on factors most blog posts skip: KV cache hit rate, batch efficiency, idle time, model versions.
This talk shows what LLM inference actually costs. From tokenomics to GPU memory to FinOps metrics you can really measure. Plus an honest framework: when API, when self-host, when hybrid.
Real numbers. No vendor marketing.
Basic understanding of LLMs and inference. Experience with cloud or Kubernetes helps but is not required. If you have ever seen an OpenAI/Claude bill or thought about running your own GPUs, you have enough context. The math stays at: price per token times number of tokens.
After the talk you will know:
- Which cost drivers actually matter in LLM inference (KV cache, batching, idle time)
- Why typical $/token comparisons between API and self-host fail
- Which FinOps metrics for LLM workloads are truly measurable
- How to calculate the break-even between API, self-host, and hybrid
- When each architecture makes sense – and when it does not
Christopher Haar
is a Software Engineer at Upbound working on open source. He's a maintainer of Crossplane and several providers in its ecosystem, and a regular contributor across the community. He brings years of experience building enterprise infrastructure across Telco, Railway, Finance, and Cloud.
Johannes Koch
is a DevOps enthusiasts, a builder by heart and an AWS DevTools Hero since June 2023. He loves building things using Typescript, Angular, Java- or Go. He believes in full CI/CD automation and everything required to run the solution being part of the infrastructure as code – here he has learned to love AWS CDK. He leads the AWS UG Bergstrasse and is a member of the AWS DACH Community.