Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are these the facts?

- You are using a container orchestrator like Kubernetes

- You are using gVisor as a container runtime

- Two applications from different users, containerized, are scheduled on the same node.

Then, which of the following are true?

(1) Both have shared access to an NVIDIA GPU

(2) Both share access to the NVIDIA GPU via CUDA MPS

(3) If there were 2 or more MIGs on the node with a MIG-supporting GPU, the NVIDIA container toolkit shim assigned a distinct MIG to each application



We don't use Kubernetes to run user workloads, we do use gVisor. We don't use MIG (multi-instance GPU) or MPS. If you run a container on Modal using N GPUs, you get the entire N GPUs.

If you'd like to learn more, you can check out our docs here: https://modal.com/docs/guide/gpu

Re not using Kubernetes, we have our own custom container runtime in Rust with optimizations like lazy loading of content-addressed file systems. https://www.youtube.com/watch?v=SlkEW4C2kd4


If the nvidia driver has a bug, can one workload access data of another running on the physical machine?

E.g. it came up in this thread: https://news.ycombinator.com/item?id=41672168


Yes. The kernel has access to data from every workload, and so technically a bug in _anything_ running at kernel level could result in data leakage.


Suppose I ask for two H100s. Will I have GPU P2P capabilities?


Yep! This is something we have internal tests for haha, you have good instincts that it can be tricky. Here's an example of using that for multi-GPU training https://modal.com/docs/examples/llm-finetuning


Okay, well think very deeply about what you are saying about isolation; the topology of the hardware; and why NVIDIA does not allow P2P access even in vGPU settings except in specific circumstances that are not yours. I think if it were as easy to make the isolation promises you are making, NVIDIA would already do it. Malformed NVLink messages make GPUs fall off the bus even in trusted applications.


Yes it will.

(I work at Modal.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: