We don't use Kubernetes to run user workloads, we do use gVisor. We don't use MIG (multi-instance GPU) or MPS. If you run a container on Modal using N GPUs, you get the entire N GPUs.
Re not using Kubernetes, we have our own custom container runtime in Rust with optimizations like lazy loading of content-addressed file systems. https://www.youtube.com/watch?v=SlkEW4C2kd4
Yep! This is something we have internal tests for haha, you have good instincts that it can be tricky. Here's an example of using that for multi-GPU training https://modal.com/docs/examples/llm-finetuning
Okay, well think very deeply about what you are saying about isolation; the topology of the hardware; and why NVIDIA does not allow P2P access even in vGPU settings except in specific circumstances that are not yours. I think if it were as easy to make the isolation promises you are making, NVIDIA would already do it. Malformed NVLink messages make GPUs fall off the bus even in trusted applications.
- You are using a container orchestrator like Kubernetes
- You are using gVisor as a container runtime
- Two applications from different users, containerized, are scheduled on the same node.
Then, which of the following are true?
(1) Both have shared access to an NVIDIA GPU
(2) Both share access to the NVIDIA GPU via CUDA MPS
(3) If there were 2 or more MIGs on the node with a MIG-supporting GPU, the NVIDIA container toolkit shim assigned a distinct MIG to each application