Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great work. It's a real bowl of fresh air coming from huge framework that use cuda.

So many different cuda version, with each framework using its own, that all rely on a different driver, and everything needs a new version every 3 months and takes ~10G, (and don't even talk about cudnn needing some manual logged-in install).

Here everything is just two files. For embedded system that don't have a GPU it's perfect.

Here the parallelization and vectorization has been done by hand, but there is a glimmer of hope coming from the side of various compiler projects :

Here is an interesting intel project that does the parallelization and vectorization automatically for different architecture that's definitely worth a look : https://ispc.github.io/ispc.html

For the auto-differentiation when I need performance or memory, I currently use tapenade ( http://tapenade.inria.fr:8080/tapenade/index.jsp ) and/or manually written gradient when I need to fuse some kernel, but Enzyme ( https://enzyme.mit.edu/ ) is also very promising.

MPI for parallelization across machines.



> MPI for parallelization across machines.

Some things never change.


Ditto about the CUDA and cuDNN part. My project that was running fine for the past 4 years just "died" after a colleague's oversight on upgrading the GPU(1080Ti -> 3090) which isn't compatible with the new cuDNN. It is just too much of a hassle maintaining that *expletive* jargon so I did the wise decision to kill it.


100%.

So much more practical to hack around and or build small apps.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: