Correct me if I'm wrong, but if the LF AI & Data Foundation (Linux Foundation) ONNX working groups support advanced quantization (down to 4-bit grouped schemes, like GGUF's Q4/Q5 formats), standardize flash attention and similar fused ops, and allow efficient memory-mapped weights through the spec and into ONNX Runtime, then Windows ML and Apple Core ML could become a credible replacement for GGUF in local-LLM land.