Managed Kubernetes Service | AI-Ready Container Orchestration

Hassle-free orchestration

Reduce operational complexity by using a secure, streamlined and up-to-date Kubernetes environment that is ready to orchestrate your AI workloads on multi-host installations.

AI-native scalability

Scale your clusters easily by adding new nodes that have NVIDIA GPU and InfiniBand drivers pre-installed. Combined with the original Kubernetes scalability, this ensures quick compute expansion when needed.

Advanced cluster reliability

Enjoy predictable AI training and inference experience by running AI workloads on resilient and highly available clusters, covered by system monitoring and Kubernetes auto-healing* mechanisms.

Use cases

Distributed model training

Run multi-host training across thousands of NVIDIA GPUs with minimal effort. Our Managed Kubernetes natively scales GPU clusters over high-speed InfiniBand fabric — plus, it supports a wide range of AI frameworks and job schedulers to extend cluster capabilities.

Real-time model inference

Deploy and run AI applications in the cloud seamlessly with Managed Kubernetes. Deploy production-ready models on GPU nodes, and natively load-balance web traffic across CPU-only instances within the same cluster.

Kubernetes applications

Kobayashi Applications streamlines access to a curated library of prebuilt images for your AI/ML workloads: From popular inference engines to Kubernetes-native job schedulers, these ready-to-use assets cut setup time so you can launch workloads faster.

Free

vLLM

A fast and easy-to-use library for LLM inference

Inference

Free

Meta Llama 3.3 70B Instruct powered by vLLM

Multilingual, strong coding/reasoning, efficient inference via vLLM

Inference

Free

Groq LLaMA3-70B-8192

Groq hardware-accelerated LLaMA3 model, high-speed LLM inference

Inference

Free

Meta Llama 4 Scout 17B Instruct powered by vLLM

Multimodal MoE model, 10M context, efficient inference via vLLM

Inference

Free

Kubeflow

The Kubernetes-native machine learning (ML) toolkit

Inference

Free

Mistral Nemo Instruct 2407 powered by vLLM

Mistral fine-tuned model, efficient LLM inference via vLLM

Inference

Free

JupyterHub with PyTorch and CUDA

Multi-user JupyterHub with PyTorch...

Inference

Free

Meta Llama 4 Maverick 17B 128E Instruct powered by vLLM

Multimodal 128-expert model, 128K context, efficient via vLLM

Inference

Free

Ollama

Local open-source LLM tool, privacy-focused

Inference

Free

Stable Diffusion web UI

A browser interface based on Gradio library for Stable Diffusion.

Inference

Free

Volcano

An operator-based system for high-performance workloads with enhanced scheduling and resource management...

Inference

Free

MLflow for Kubernetes

Manage your ML experiments in a Kubernetes cluster.

Inference

* This feature is currently in development.

Kubernetes is a registered trademark of The Linux Foundation (in the United States and other jurisdictions).

Managed Service for Kubernetes®

Hassle-free orchestration

AI-native scalability

Advanced cluster reliability

Use cases

Distributed model training

Real-time model inference

Kubernetes applications

vLLM

Meta Llama 3.3 70B Instruct powered by vLLM

Groq LLaMA3-70B-8192

Meta Llama 4 Scout 17B Instruct powered by vLLM

Kubeflow

Mistral Nemo Instruct 2407 powered by vLLM

JupyterHub with PyTorch and CUDA

Meta Llama 4 Maverick 17B 128E Instruct powered by vLLM

Ollama

Stable Diffusion web UI

Volcano

MLflow for Kubernetes

Contact Us