Managed Soperator

Our fully managed Slurm-on-Kubernetes operator streamlines AI training on NVIDIA GPU clusters — cutting infrastructure complexity so you can focus on model development.

Pricing

One-click cluster setup

Launch your training environment in minutes (not days). Our solution handles node provisioning, pre-installed dependencies, and full infrastructure setup — so you can schedule jobs instantly, no configuration required.

Fault-tolerant training

Train models stress-free: automatic health checks & recovery keep jobs running through hardware/node failures (zero downtime). Integrated monitoring dashboards & logging give you full cluster visibility and control.

Maximum GPU utilization

Maximize your AI hardware ROI: smart scheduling & topology-aware job placement boost efficiency for large-scale training. Optimized dependencies ensure fast execution of your model training frameworks.

Running Slurm on Kubernetes

Our Managed Operator is powered by Soperator — our custom-built Slurm Kubernetes Operator. This lets us deliver Slurm’s advanced job scheduling capabilities and Kubernetes’ cloud-native flexibility in a single, unified AI training environment.

How it works

A shared root filesystem provides a unified file environment across all cluster nodes — streamlining package management and boosting cluster scalability.

Slurm-on-Kubernetes solutions by Kobayashi

	Managed Operator	Professional Operator	Soperator
Solution	Slurm-based clusters	Slurm-based clusters	Kubernetes operator for Slurm
Delivery model	Self-service app	Professional service	Open-source software
Cloud environment	Kobayashi	Kobayashi	Cloud agnostic
Pre-installed AI/ML-drivers and libraries	Yes	Yes	Yes
All types of containers supported	Yes	Yes	Yes
Passive health checks	Yes	Yes	No
Active health checks	Yes	Yes	No
Topology-aware job scheduling	Yes	Yes	No
Auto-healing mechanism	Yes	Yes	on Kobayashi cloud only
Free software, consumption-based pricing	Yes	Yes	Yes