Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, fun to find this trending on HN this morning! I am currently also working on the associated video lecture (as the next episode of my video lecture series here https://karpathy.ai/zero-to-hero.html ), where I will build nanoGPT from scratch and aspire to spell everything out, as with the earlier videos. Hoping to get it out in ~2 weeks or so.


Open accessible lectures / knowledge like yours allowed many people, me included, to turn their life around by putting in the effort and develop themselves. Thank you.


While doing my PhD some years ago (it wasn't a PhD on AI, but very much related) I trained several models with the usual stack back then (pytorch and some others in TF). I realized that a lot of this stack could be rewritten in much simpler terms without sacrificing much fidelity and/or performance in the end.

Submissions like yours and other projects like this one (recently featured here as well) -> https://github.com/ggerganov/whisper.cpp, makes it pretty clear to me that this intuition is correct.

There's a couple tools I created back then that could push things further towards this direction, unfortunately they're not mature enough to warrant a release but the ideas they portray are worth taking a look at (IMHO) and I'll be happy to share them. If there's interest on your side (or anyone reading this thread) I'd love to talk more about it.


Your youtube playlist combined with NanoGPT and your Lex Fridman podcast is like having a university level degree with a free internship guidance. Thank you!


Just wanted to say thank you for all the incredible work and resources you publish. I've lost track of all the different skills I've learned from you, from computer vision, RNNs, minGPT, even speedcubing :D


+1. I've benefited greatly from your content, e.g. your CNN lecture was incredibly accessible [0]. I still find transformers stubbornly elude my intuitions despite reading many descriptions. I would very much appreciate your video lecture on this topic.

[0] I think https://www.youtube.com/watch?v=LxfUGhug-iQ


I've found all of your code and lessons on youtube so incredibly useful. You're a wonderful teacher and I really appreciate all the work you've done with this!


Andrej: thank you!

--

To the mod (dang): IMHO Andrej's comment should probably be at the top of the page, not my comment. UPDATE: Looks like that's done. Thank you :-)


Thank you for your amazing work. Between cs231n and your recent videos, I've learned a ton - and you have a gift to explain things in such an easy and straightforward way, that I'm always feeling like an idiot (in a positive way) for not having grasped the concept before.


Bad ass! A great addition would be some content on tuning pre-trained language models for particular purposes. It would be great to have examples of things like tuning a GPT model trained on language and code to take in a context and spit out code in my custom API, or using my internal terminology. Not sure if this is RL based fine tuning or just a bunch of language to code examples in a fine tuning dataset? In essence, how can we start using language to control our software?


Ty agree, most people practically speaking will be interested in finetuning rather than from-scratch pretraining. I currently have some language about it in readme but I agree this should get more focus, docs, examples, etc.


Appreciate the work to make GPT training accessible!

Do you leave hyperparams (like learning rate, batch size) the same when switching from 8xA100 to fewer GPUs, or do these need to be adjusted?

Separately, when going from 8xA100 GPU to a single A100 GPU, in the worst case we can expect the same model performance after training 8x as long correct? (And likely a bit better because we get more gradient updates in with smaller batch size)


Thank you for sharing your knowledge. Anything that can be done to democratize machine learning is an invaluable social service. Hats off to you.


Saying absolutely nothing new here, but your work is so damn inspiring! I wish I had such a natural connect to my work, an ability to distill complex concepts down to the fundamentals, and such inventiveness! I took your CS231N class at Stanford as well. Implementing the fundamental building blocks like backprop was fun and insightful. Thanks again for your passion and teaching!


Your tutorials are effective and concise. Thank you for them! Accessible, from-scratch knowledge on these topics is essential at this time in history and you're really making a dent in that problem.


Thanks, I love your video about back propagation where you painstakingly spell out every calculation. It was like a breath of fresh air compared to other materials out there.


Thanks for your work Andrej! I've been doing earlier lectures and this is absolutely fantastic educational content!


Thank you for your constant contributions.


Amazing work, much appreciated


Thank you for your great work!


just started watching your lectures! they are great!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: