Managed Service for Kubernetes®

This fully managed container orchestrator is purpose-built for modern AI workloads — delivering the scalability, performance, and reliability needed to power your most demanding model training and inference jobs.

Pricing

Hassle-free orchestration

Reduce operational complexity by using a secure, streamlined and up-to-date Kubernetes environment that is ready to orchestrate your AI workloads on multi-host installations.

AI-native scalability

Scale your clusters easily by adding new nodes that have NVIDIA GPU and InfiniBand drivers pre-installed. Combined with the original Kubernetes scalability, this ensures quick compute expansion when needed.

Advanced cluster reliability

Enjoy predictable AI training and inference experience by running AI workloads on resilient and highly available clusters, covered by system monitoring and Kubernetes auto-healing* mechanisms.

Use cases

Distributed model training

Run multi-host training across thousands of NVIDIA GPUs with minimal effort. Our Managed Kubernetes natively scales GPU clusters over high-speed InfiniBand fabric — plus, it supports a wide range of AI frameworks and job schedulers to extend cluster capabilities.

Real-time model inference

Deploy and run AI applications in the cloud seamlessly with Managed Kubernetes. Deploy production-ready models on GPU nodes, and natively load-balance web traffic across CPU-only instances within the same cluster.

Kubernetes applications

Kobayashi Applications streamlines access to a curated library of prebuilt images for your AI/ML workloads: From popular inference engines to Kubernetes-native job schedulers, these ready-to-use assets cut setup time so you can launch workloads faster.

vLLM Logo
Free

vLLM

A fast and easy-to-use library for LLM inference

Inference
Meta Logo
Free

Meta Llama 3.3 70B Instruct powered by vLLM

Multilingual, strong coding/reasoning, efficient inference via vLLM

Inference
Groq Logo
Free

Groq LLaMA3-70B-8192

Groq hardware-accelerated LLaMA3 model, high-speed LLM inference

Inference
Meta Logo
Free

Meta Llama 4 Scout 17B Instruct powered by vLLM

Multimodal MoE model, 10M context, efficient inference via vLLM

Inference
Kubeflow Logo
Free

Kubeflow

The Kubernetes-native machine learning (ML) toolkit

Inference
Mistral Logo
Free

Mistral Nemo Instruct 2407 powered by vLLM

Mistral fine-tuned model, efficient LLM inference via vLLM

Inference
JupyterHub Logo
Free

JupyterHub with PyTorch and CUDA

Multi-user JupyterHub with PyTorch...

Inference
Meta Logo
Free

Meta Llama 4 Maverick 17B 128E Instruct powered by vLLM

Multimodal 128-expert model, 128K context, efficient via vLLM

Inference
Ollama Logo
Free

Ollama

Local open-source LLM tool, privacy-focused

Inference
Stable Diffusion Logo
Free

Stable Diffusion web UI

A browser interface based on Gradio library for Stable Diffusion.

Inference
Volcano Logo
Free

Volcano

An operator-based system for high-performance workloads with enhanced scheduling and resource management...

Inference
MLflow Logo
Free

MLflow for Kubernetes

Manage your ML experiments in a Kubernetes cluster.

Inference

* This feature is currently in development.

Kubernetes is a registered trademark of The Linux Foundation (in the United States and other jurisdictions).