If I recall properly, the command circular buffers of 2^n bytes ("queues" in vulkan3d) are VRAM IOMMAP-ed (you just need atomic R/W pointers for synchronization, see mathematically proven synchronization algorithms).
There is a "GPU IRQ" circular buffer of 2^n bytes coupled with PCIE MSIs (and I recall something about a hardware "message box").
The "thing" is, for many of them, how to use those commands and how they are defined feels very weird (for instance the 3d/compute pipeline registers programing).
Have a look at libdrm from the mesa project (the AMDGPU submodule), then it will give you pointers where to look into the kernel-DRM via the right IOCTLs.
Basically, the kernel code is initialization, quirks detection and restoration (firmware blobs are failing hard here), setting up of the various vram virtual address spaces (16 on latest GPUs) and the various circular buffers. The 3D/compute pipeline programing is done from userspace via those circular buffers.
If I am not too much mistaken, on lastest GPU "everything" should be in 1 PCIE 64bits bar (thx to bar size reprograming).
The challenge for AMD is to make all that dead simple and clean while keeping the extreme performance (GPU is all about performance). Heard rumors about near 0-driver hardware (namely "rdy" at power up).
> Have a look at libdrm from the mesa project (the AMDGPU submodule), then it will give you pointers where to look into the kernel-DRM via the right IOCTLs.
I wonder how much LAPSUS$ hack has to do with it.
I wonder if nvidia hardware programing interface is a mess like AMD one, just curious.