> The language model's body of "knowledge" tends to fall off outside of functionality commonly covered in tutorials. Writing a "hello world" program is no problem; proposing a design for (or, worse, an addition to) a large application is hopeless.
Hard disagree. I've used GPT-4 to write full optimizers from papers that were published long after the cutoff date that use concepts that simply didn't exist in the training corpus. Trivial modifications were done after to help with memory usage and whatnot, but more often than not if I provide it the appropriate text from a paper it'll spit something out that more or less works. I have enough knowledge in the field to verify the corectness.
Most recently I used GPT-4 to implement the paper Bayesian Flow Networks, a completely new concept that I recall from the comment section on HN people said "this is way too complicated for people who don't intimately know the field" to make any use of.
I don't mind it when people don't find use with LLMs for their particular problems, but I simply don't run into the vast majority of uselessness that people find, and it really makes me wonder how people are prompting to manage to find such difficulty with them.
You are literally describing the fundamental problem of truth in philosophy and acting as if it's different because a computer is involved at one step in the chain.
Show me one assistant who can promise they are right 100% of the time and I will show you one liar.
Copilot & Copilot Chat cut down my coding time on a brand-new ML optimizer that was released in a paper last week from what would have been 20+ hours into a 4-hour session and I got fancy testing code as a free bonus. If I had coded it by myself and taken the full amount of time it would have taken to figure out which parameter was being sent on which layer for which gradient was currently being processed, I wouldn't have had the energy to write any tests.
I don't understand what people's expectations are of AI that they're being disappointed. You figure out the limitations quickly if you use it on a regular basis, and you just adapt those shortcomings into your mental calculus. I still code plenty by myself in a good old vim session because I don't think copilot would actually be very useful in reducing the amount of time it would take me to code something up, but I don't count that as a "failure" of AI, I view it as knowing when to use a tool and when not to.
It does not. Regular ChatGPT without plugins does not have access to any tools. Throw it a script with some weird outputs and it'll definitely fail, every time. While the script 'evaluation' stuff can be pretty decently impressive, it is not actually executing anything.
After I finished it myself, I ran it through ChatGPT (and Davinci, but that largely failed.) It generated part 1 perfectly, but it was unable to make the jump into the three compartment intersections without significant prompting, and the first couple times it completely lost the uppercase/lowercase distinctions. It was able to generate largely perfect tests from the examples though, and I had it debug itself until it worked. It wasn't amazing code, but it passed its tests and actually thought of some edge cases I hadn't considered when I coded my solution, such as odd length strings.
Yes. Every single serious chess master uses Chessbase as it has the largest database available out there even though the price is pretty obscene. The customer base is more dedicated than most and it's probably the single most important tool a chess pro can get outside of an engine itself.
Edit: It may not be the single largest database, I suspect that honor goes to Chess.com or Lichess, but it is certainly the largest curated one.
It’s not even about the database per se… what’s not easily replicated are the annotated (commented/analyzed) games, high quality metadata.
This actual software, in terms of UI, is basically shit, and the technology is questionable (search really should be an order faster - my understanding is it has to do a full sequential read of the main DB (which is several GB)
Oh, the software is absolutely awful, and you're spot on about the metadata (and the CBH format as a whole,) but I would also toss in another point where it's (sadly) the best: Opening prep. There just isn't much else that can compare despite the awful interface.
Hard disagree. I've used GPT-4 to write full optimizers from papers that were published long after the cutoff date that use concepts that simply didn't exist in the training corpus. Trivial modifications were done after to help with memory usage and whatnot, but more often than not if I provide it the appropriate text from a paper it'll spit something out that more or less works. I have enough knowledge in the field to verify the corectness.
Most recently I used GPT-4 to implement the paper Bayesian Flow Networks, a completely new concept that I recall from the comment section on HN people said "this is way too complicated for people who don't intimately know the field" to make any use of.
I don't mind it when people don't find use with LLMs for their particular problems, but I simply don't run into the vast majority of uselessness that people find, and it really makes me wonder how people are prompting to manage to find such difficulty with them.