This is awesome, well done guys, I’m gonna try it as my ASR component on the local voice assistant I’ve been building https://github.com/acatovic/ova. The tiny streaming latencies you show look insane
This is such an insane rabbit hole. AI labs distill weights from the entirety of the internet knowledge, (mostly) without anyone's consent, which (technically) amounts to theft. However the chinchilla law dictates you need to expend X amount of energy to make this knowledge useful. Then the data law dictates that you need to shift the weights to a more useful latent space by paying maths, coding and domain experts lots of money. So you have "stolen" the data, but then paid billions to make it useful. And useful it is!
Then another lab comes, and "steals" from you - that beautiful, refined dataoil - by distilling your weights using inferior equipment but with a toolbox of ingenuity and low-level hacking tricks. They reach 90% of your performance at 20x cost reduction.
What happens when another lab distills from the distilled lab?
What would you define as 'distillation' versus 'learning'? How do you know that what a LLM is doing is 'distillation' vs a process closer to a human reading a book?
From my perspective, pretraining is pretty clearly not 'distilling', as the goal is not to replicate the pretraining data but to generalize. But what these companies are doing is clearly 'distilling' in that they want their models to exactly emulate Claude's behavior.
That's a soft distinction (distilling vs learning). If I read a chapter in a text book I am distilling the knowledge from that chapter into my own latent space - one would hope I learn something. Flipping it the other way, you could say that model from Lab Y is ALSO learning the model from Lab X. Not just "distilling". Hence my original comment - how deep does this go?
> And yet nearly every machine learning engineer would disagree with you, which is a given away that your argument is rooted in ideology.
That's a bold statement! Of course I know the difference, in one case you are learning from correct/wrong answers, and in the other from a probability distribution. But in both cases you are using some X to move the weights. We can get down and gritty on KL divergence vs cross-entropy, but the whole topic is about "theft", which is perhaps in the eye of the beholder.
I don’t agree with this drowning sentiment. It’s much easier now to build capable stuff. That’s what the data is showing you. Pre AI nostalgia - sure, I built a PPC profiler in assembly by hand, but who am I to say the latest AI induced gadgetry is not as cool. And I am an active participant.
Incredibly I didn't do anything, I just told codex to use playwright cli, told it what to check (in plain English), and it did its thing. Looking at its log I can see that it was "playing" the game and defining its own test conditions, such as if the player/NPC is *not* on one of the "collidable" tiles, if the NPC is "going over the edge" of a collidable area, if it's facing the wrong way, etc. Sometimes it found bugs, e.g. it was testing for gravity checks and then it found that one of the movements was not working correctly and it went ahead and fixed it.
So essentially it uses CLI to read all the x,y coordinates, speed, timing, it took screenshots, and combined those together.
My learning from this is - just let the agent do it. Actually trying to interfere with specific conditions and checks lowers the agent's performance. Simply give it a guide.
Yeah character walking (and honestly any animation) is still a real pain. I've seen some people try to use video models almost like a rotoscoping tool but it always looks kind of sus honestly.
Maintaining consistency across a set of loopable frames still takes a fair amount of manual work. Here's an example I put together using controlnet and openpose.
I don't have any professional experience but I did build a pacman clone in Turbo Pascal (on MS DOS) back in the 90s. Also I've been playing games all my life so have a feeling for what's important - collision detection, walkable/collidable areas, speed/timing, some basic NPC logic etc. So that seemed to have been sufficient in this case :-)
I’ve been doing this for the past 15 years - writing “LAZYs”, started off as just .txt files, now .md. The nice thing with it now you can search through it easily or give it to Codex
reply