| 1. | | Can LLMs play real-time games like supermario (other than Pokemon red)? (twitter.com/haoailab) |
| 3 points by zhisbug 11 months ago | past | 1 comment |
|
| 2. | | Sliding Tile Attention: A New Method That Speeds Up HunyuanVideo's Outputs by 3x (reddit.com) |
| 2 points by zhisbug 12 months ago | past | 1 comment |
|
| 3. | | Fast Video Generation with Sliding Tile Attention (hao-ai-lab.github.io) |
| 12 points by zhisbug 12 months ago | past | 2 comments |
|
| 4. | | More Efficient Chain-of-Thought Reasoning Through Certainty Probing (huggingface.co) |
| 6 points by zhisbug 12 months ago | past | 2 comments |
|
| 5. | | AI Space Escape: Playing Games While Evaluting LLM Reasonsing (lmgame.org) |
| 13 points by zhisbug on Feb 11, 2025 | past | 2 comments |
|
| 6. | | Efficient LLM Scheduling by Learning to Rank (hao-ai-lab.github.io) |
| 2 points by zhisbug on Jan 14, 2025 | past | 1 comment |
|
| 7. | | FastVideo: a lightweight framework for accelerating large video diffusion models (github.com/hao-ai-lab) |
| 110 points by zhisbug on Dec 17, 2024 | past | 24 comments |
|
| 8. | | MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving (hao-ai-lab.github.io) |
| 2 points by zhisbug on June 24, 2024 | past | 1 comment |
|
| 9. | | Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x (hao-ai-lab.github.io) |
| 461 points by zhisbug on May 8, 2024 | past | 98 comments |
|
| 10. | | Throughput Is Not All You Need: Maxing Goodput in LLM Serving via Disaggregation (hao-ai-lab.github.io) |
| 5 points by zhisbug on March 18, 2024 | past | 1 comment |
|
| 11. | | Break the Sequential Dependency of LLM Inference Using Lookahead Decoding (lmsys.org) |
| 17 points by zhisbug on Nov 21, 2023 | past | 2 comments |
|
| 12. | | Important and *MUST-KNOW* techniques for a 2023 LLM serving system (twitter.com/haozhangml) |
| 1 point by zhisbug on Sept 13, 2023 | past |
|
| 13. | | Fastchat-T5: 4x smaller but more powerful than Dolly-v2, commercial use ready (twitter.com/lmsysorg) |
| 7 points by zhisbug on April 28, 2023 | past | 1 comment |
|
| 14. | | Alpa: Auto-parallelizing large model training and inference (by UC Berkeley) (github.com/alpa-projects) |
| 7 points by zhisbug on June 23, 2022 | past | 1 comment |
|