Hi everyone yes, I left OpenAI yesterday

Imnimo · on Feb 14, 2024

Every time Karpathy quits his job, the field takes a leap forward because he makes some fantastic educational resource in his free time.

skybrian · on Feb 14, 2024

Examples? (I'm not that familiar with field.)

weinzierl · on Feb 14, 2024

Andrej Karpathy is badmephisto, a name you might have heard of if you're into cubing.

http://badmephisto.com/

Copenjin · on Feb 14, 2024

10 years ago: https://youtu.be/WhPjlnWbtS8?feature=shared&t=359

joenot443 · on Feb 14, 2024

Wow - that earnestly gave me goosebumps. I'm a Googler myself and it's humbling seeing him casually describe, 10 years ago, a technology the industry was still in the early stages of developing, which has since taken the world by storm. What a rockstar.

Jensson · on Feb 14, 2024

Neural networks were already big 10 years ago, you have to go back 15 years to see before they started being popular.

From wikipedia:

> Between 2009 and 2012, ANNs began winning prizes in image recognition contests, approaching human level performance on various tasks, initially in pattern recognition and handwriting recognition.

That was when Neural networks became a big thing every tech person knew about, 2014 it was already in full swing and you had neural networks do stuff everywhere, like recognizing faces or classifying images.

MenhirMike · on Feb 14, 2024

For reference, AlphaGo defeated Lee Sedol in March 2016, which was another reminder to a broader non-tech audience how far things had gotten.

altintx · on Feb 14, 2024

NN were already a casual topic in my high school computer science class more than 20 years ago. I've always assumed they were already fairly common by that point. (~2000)

Jensson · on Feb 14, 2024

They were a scientific curiosity at that point, the widespread use in the industry happened around 10-15 years ago.

acdha · on Feb 14, 2024

They were known in the field but had a reputation for being too slow. I remember a couple of early 2000s NIPS (now NeurIPS) people commenting about what a shame it was that NN were computationally infeasible, which was true in the era before GPUs took off.

kbelder · on Feb 14, 2024

They were and they were in use, for instance in character recognition. They just hadn't had their breakout success yet.

Jensson · on Feb 15, 2024

Neural networks weren't the best models for character recognition, their breakout success was when they started being the best at recognize characters and other images which happened in the late 00's. OCR before then was really bad.

Might be hard to imagine today but back then OCR and image recognition was typically done with normal statistical regression models, and the neural networks they had then were worse than those.

CRConrad · on Feb 16, 2024

Neural networks were mentioned -- not particularly often, but now and then -- in such non-rarefied publications as BYTE Magazine in the 1980s and 90s, AFAICR.

utopcell · on Feb 15, 2024

Thank you for sharing this link. The Gates building has changed so much over the last decade, it was nice to see it as it once was.

jamal-kumar · on Feb 14, 2024

Him solving a rubix cube and riding a bike at the same time is a pretty impressive demonstration of motor coordination

Prcmaker · on Feb 14, 2024

Wow, that's a connection between the eras of my life I would not have thought existed. Thank you.

jasmataz · on Feb 14, 2024

I haven't been that surprised by something in a long time. Wow that is crazy. I made a little unfinished 3d Rubik's Cube site for fun a while back and the about section includes a link to his channel and some other older cubing channels. https://rubie-cubie.vercel.app/

namanyayg · on Feb 14, 2024

You just blew my mind, I used to hang around this site a lot ~10 years ago and never would have made the connection

cbracketdash · on Feb 14, 2024

This blew my mind as well!! I never thought one of my favorite programmers would share a similar hobby haha

pushedx · on Feb 14, 2024

I learned F2L from him in 2009!

kanbara · on Feb 14, 2024

i saw this comment and literally shouted “holy fucking shit” — i use zz method, but i came across lots of his resources before!!!

longnguyen · on Feb 14, 2024

Oh wow this blew my mind

realprimoh · on Feb 14, 2024

Holy crap!! I never knew that. I watched this guy so much in 5th grade, he helped me get my 3x3 time down to 8 seconds.

This is insane

rmorey · on Feb 14, 2024

WHAT!

Imnimo · on Feb 14, 2024

The most recent is this, which I believe was made after he left Tesla:

https://github.com/karpathy/nanoGPT

And it's accompanying video series:

https://karpathy.ai/zero-to-hero.html

Another example (although I honestly don't remember if he made this one between jobs) is: https://github.com/karpathy/micrograd

joss82 · on Feb 14, 2024

Neural Networks: from zero to hero

https://karpathy.ai/zero-to-hero.html

wwilim · on Feb 14, 2024

My master's was in Convolutional NNs for language processing. I had zero prior knowledge and my advisor recommended I watch Karpathy's lectures[1] to get up to speed

[1] https://youtube.com/playlist?list=PLkt2uSq6rBVctENoVBg1TpCC7...

anoopelias · on Feb 14, 2024

And he was teaching CS231n in Stanford in 2016

https://www.youtube.com/watch?v=NfnWJUyUJYU&list=PLkt2uSq6rB...

antupis · on Feb 14, 2024

yup, I hope we get awsome open source-related content now.

lyapunova · on Feb 14, 2024

Let me say, he's a great teacher! I took a CV class with him. He should teach more, and take it seriously.

Being a popular AI influencer is not necessarily correlated with being a good researcher though. And I would argue there is a strong indication that it is negatively correlated with being a good business leader / founder.

Here's to hoping he chills out and goes back to the sorely needed lost art of explaining complicated things in elegant ways, and doesn't stray too far back into wasting time with all the top sheisters of the valley.

Edit: the more I think about it, the more I realize that it probably screws with a person to have their tweets get b-lined to the front page of hackernews. It makes you a target for offers and opportunities because of your name/influence, but not necessarily because of your underlying "best fit"

johnnyanmac · on Feb 14, 2024

>He should teach more, and take it seriously.

if only we compensated that knowledge properly. Youtube seems to come the closest, but Youtube educators also show how much time you have to spend attracting views instead of teaching expertise.

> It makes you a target for offers and opportunities because of your name/influence, but not necessarily because of your underlying "best fit"

That's unfortunately life in a nutshell. The best fits rarely end up getting any given position. May be overqualified, filtered out in the HR steps, or rejected for some ephemeral reason (making them RTO, not accepting their counteroffer, potentially illegal factors behind closed doors, etc).

it's a crappy game so I don't blame anyone for using whatever cards they are dealt.

samspenc · on Feb 14, 2024

> Youtube seems to come the closest, but Youtube educators also show how much time you have to spend attracting views instead of teaching expertise.

Actually for all the attention that the top Youtubers get (in terms of revenue), the reality is that it's going to be impossible to replace teaching income with popular Youtube videos alone.

Based on what I've seen, 1 million video views on Youtube gets you something like $5-10K. And that's with a primarily US audience that has the higher CPM / RPM. So your channel(s) would need to get to about 6 million views per year, primarily US driven, in order to get to earning a median US wage.

SCM-Enthusiast · on Feb 15, 2024

If you made video a week and the average is 115k views, you replace your median salary[0]. But the logic on ppc ends up being alot more complicated than you assume.

to get 6m views you need to make one video a week that gets 114k views 6000000/52 = 115,384.61.

godelski · on Feb 14, 2024

> if only we compensated that knowledge properly.

Something I've been thinking a lot about is the transition into post scarcity and how we need to dramatically alter the incentive structures and payment allocations.

I've been asking this question for about a decade and still have no good solutions: "What do you do when x% of your workforce is unemployable?" (being that x% of jobs are removed without replacement. Imagine sophisticated and cheap robots. Or if needed, magic)

This is a thought experiment, so your answer can't be "there'll be new jobs." Even if you believe that's what'll happen in real life, it's not in bounds of the thought experiment. It is best to consider multiple values of x because it is likely to change and that would more reflect a post scarcity transition. It is not outside the realms of possibility that in the future you can obtain food, shelter, and medical care for free or at practically no cost. "Too cheap to meter" if you will.

I'll give you two answers that I've gotten that I find interesting. I do not think either are great and they each have issues. 1) jobs programs. Have people do unnecessary jobs simply so they create work wherein we can compensate them. 2) Entertainment. People are, on average, far more interested in watching people play chess against one another than computers, despite the computer being better. So reasons that this ,,might,, not go away.

fuzzfactor · on Feb 14, 2024

>The best fits rarely end up getting any given position.

This can be self-fulfilling.

In an organization beyond a certain size, there will be more almost-adequate-fits than there are leadership positions. This could be about like a recognized baseline which seems like it really needs to be scrutinized closely to see exactly who might be slightly above or below the line.

Or in a small company where there is not any almost-fit whatsoever, imagination can result in an ideal that is equally recognizable, but also might not be fully attainable.

Either way it could be OK but not exactly the best-fit.

If good fortune smiles and the rare more-than-adequate-fit appears anywhere on the horizon though, it's so unfamiliar they fly right over the radar.

sharadov · on Feb 14, 2024

I don't think he needs the money. I googled around and he's worth 50 million.

jejeyyy77 · on Feb 14, 2024

I would pay for a course from him

bobthepanda · on Feb 14, 2024

Sometimes I wish as a profession we valued teaching more. I would love to teach, but not do research, and make a living.

passion__desire · on Feb 14, 2024

Become the next 3blue1brown. He has inspired many.

Here's a gem of educator. Check out his other videos.

https://www.youtube.com/watch?v=dhYqflvJMXc

vintermann · on Feb 14, 2024

Seconded! Another math youtuber who is an outrageously good educator is Adithya Chakravarthy a.k.a Aleph 0. He doesn't put out videos very often, but when he does you're probably going to learn something new even if you knew the topic he was speaking about.

He uses elegant hand-drawn notes rather than Manim - although 3blue1brown's open sourced visualization library is beautiful too, I think this makes it extra impressive.

WhrRTheBaboons · on Feb 14, 2024

easier said than done

3b1b's main selling point is the extreme level of polish on his visualizations - something that takes a lot of time (money) to develop

the sad part is that it takes extreme luck to make it on yt. i wish educating skills counted for more but unfortunately they don't, really.

bigyikes · on Feb 14, 2024

YouTube is a lot less luck and a lot more skill than people realize. If you make good content regularly, your audience will find you.

The algorithm is very good at rewarding good content. (It’s also good at rewarding other things, but that is besides the point)

jorvi · on Feb 14, 2024

Yup. There is probably a few dozen (if not hundreds) of 3b1b out there with just as much polish, the algorithm just didn’t bless them.

bigyikes · on Feb 14, 2024

Do you have any examples? I find this hard to believe.

mikhailfranco · on Feb 14, 2024

3blue1brown runs Summer of Math competitions to highlight other creative math videos. Many, but not all, use the same 3b1b 'manim' animation software, so they often have the same look'n'feel. Here are the results from 2022, and the huge YT playlist:

https://www.3blue1brown.com/blog/some2

https://www.youtube.com/playlist?list=PLnQX-jgAF5pTZXPiD8ciE...

jorvi · on Feb 14, 2024

That’s kind of the point, you won’t be able to due to the algorithm.

I can give you something analogous though: I’m a big fan of old school east coast hip-hop. You have the established mainline artists from back then (“Nas”, “Jay-Z”, “Big L”, etc), then you have a the established underground artists (say, “Lord Finesse” or “Kool G Rap”), and then you have the really really underground guys like “Mr. Low Kash ‘n Da Shady Bunch”, “Superscientifiku”, “Punk Barbarians”, “Harlekinz”, etc.

A lot of those in that third “tier” are every bit as good as the second tier. And both tiers contain a lot of artists that could hit the quality point of the mainline artists, they just never had access to the producer and studio time that the mainline did.

I know these artists because I love going digging for the next hidden gem. Spotify recommended me perhaps one or two of all the super-underground guys.

Ironically more West-coast style, but here is a great example (explicit!): https://youtu.be/BUwJMVKSMtY?t=129

Dude could’ve measured up to the best of the west coast. Spotify monthly listener count? 891.

Algorithms are sadly win-more.

Now I’m just silently hoping a math nerd will feel inclined to share their hidden math channel gems :+)

_tk_ · on Feb 14, 2024

Somewhat off-topic, but what do you feel like are the best techniques to find the artists in Tier 2 and 3? I face a similar conundrum just in a different genre.

jorvi · on Feb 14, 2024

(I realize know I dislike using the descriptor "tier", as it implies some sort of ranking. Perhaps "layer" would have been better, but I'll stick with it for now)

For both tier 2 and tier 3 its basically the same process. This is for Spotify btw, I have no idea how different the workflow would be for something like Apple Music.

Say the genre you want to dig around in is Hip-Hop. You are aware of Eminem and Mac Miller, and vaguely aware of a guy named Nas. By intuition you'd probably already be able to tell that Nas is more at the edge among the mainline artists.

You click on "Nas", and scroll down to Fans also like. Right now, for "Nas", it is showing "Mobb Deep", "Mos Def", "Rakim", "Big L", "Wu-Tang Clan", "Gang Starr", "Ghostface Killah", "Method Man" and "Common".

This is a mix T1 and T2. "Wu-Tang"s in there along with assorted members, but some of the other artists are much lesser known quantities.

Its a bit hard for me to decide what a Hip-Hop layman would consider the most unknown name here, but I'd venture it'd be "Big L". We click on him, do the same thing. Now we're really getting somewhere, with guys like "Inspectah Deck" and "Smif-n-Wessun". Click, dig, we get a bunch of names amongst which "Lord Finesse" stands out. The Show more at the end of Fans Like is also invaluable.

In total the dig order for me to get to the very bottom of the undeground is "Nas" > "Big L" > "Smif-n-Wessun" > "Lord Finesse" > "Channel Live" > "Ed OG & Da Bulldogs" > "Trends of Culture" > "Brokin English Klik" (358 monthly listeners).

I wouldn't consider each of those going a tier (layer) deeper. As a guy who knows waaay too much about Hip-Hop, I'd separate them into:

- T1: "Nas", "Big L"

- T2 "Smif-n-Wessun", "Lord Finesse"

- T3 "Channel Live", "Ed OG & Da Bulldogs", "Trends of Culture", "Brokin English Klik"

Perhaps "Brokin English Klik" should be in its own T4 and 3 tiers lacks the fidelity to be necessarily accurate. Not sure.

A little shortcut would be using "The Edge of $Genre" playlists. They're the pair playlists to "The Sound of $Genre" (broad slice) and "The Pulse of $Genre" (most popular) generated via everynoise.com, although as that guy got fired from Spotify its up in the air how long those will keep working.

Edit: oh, and if you run into a playlist that caters to that deep underground (in my case, that was "90's Tapes"*), that's worth its bytes in gold.

*https://open.spotify.com/playlist/2H0rNGEBShvHSGebM2m37c?si=...

passion__desire · on Feb 14, 2024

I hate the fact there is no diversity in recommendation algos. We need to bring back Yahoo style top-down directories recommendations and not just a blackbox. But you can find good channels on youtube using tags like "#some3" and "#some2" and so on.

jorvi · on Feb 14, 2024

(I deeply hate TikTok)

TikTok's recommendation algorithm is probably one of the best. It puts content first, giving what seems only a passing weight to follower count.

That doesn't mean that having a big follower count doesn't increase you chance to go viral and gain a lot of views, but it is much more likely for great content from a small creator to go viral, than mediocre content from someone with 500.000 followers.

You can also see this in that successful TikTok profiles often have a much higher view-to-follower ratio than something like YouTube.

plutokras · on Feb 15, 2024

https://www.youtube.com/@Reducible/videos

mlrtime · on Feb 14, 2024

So then isn't there a market for millions of people who [may] have something worth teaching but lack the marketing/polish?

Perhaps some automation/ai combination where you feed it learning videos and it helps create the "other" content.

da39a3ee · on Feb 15, 2024

3b1b's animations are certainly important but his main selling point is his thoughtful explanations of mathematics -- the topics, approaches, and words.

godelski · on Feb 14, 2024

He's a great educator, but at the same time we must recognize that his videos are not a replacement for a traditional math course. They amplify the existing paradigm, not replace.

MOCs are great for access, but they are not, and definitely should not be treated as, replacements. That I am certain will have a net negative result. I'm in grad school and there's something I tell students on the first day:

> The main value in you paying (tuition) and attending is not just to hear me lecture, but to be able to stop, interrupt, and ask questions or visit me in office hours. If you are just interested in lectures I've linked several on our website from high quality as well as several books, blogs, and other resources. Everyone should all use these. But you can't talk to a video or book, but you can to me. You should use all of these resources to maximize your learning. I will not be taking attendance.

I'm sure many of you have had lectures with a hundred students if you went to a large school (I luckily did not). You're probably aware how different that is from a smaller course. It's great for access and certainly is monetarily efficient, but its certainly not the most efficient for educating an individual. MOCs are great because they increase the ability of educators to share notes. We pull from one another all the time (with credit of course), because if someone else teaches in a better way than I do, I should update the way I teach. MOCs are more an extension of books. Youtube is the same, but at the end of the day you can't learn math without doing math. Even Grant states this explicitly.

eurekin · on Feb 14, 2024

Not only inspired, but probably also did a soothing therapy with his voice and delivery pace. :)

Strikes a balance between sounding engaging and soothing at the same time.

dustingetz · on Feb 14, 2024

this is disrupting education. you can get a better undergraduate education in STEM on youtube than my paid education 20 years ago. I think those visualizations can even pull forward a bunch of stuff into high school.

scarecrowbob · on Feb 14, 2024

Well, I get the point and find it appealing but I don't agree.

When my kiddo was a sophomore in HS he decided that he wanted to be an engineer, and I thought that it would be really good for him to learn calc- my feeling was that if he got out of HS without at least getting through Calculus he'd have a really hard time.

So _I_ learned calculus. I started with basic math on Kahn and moved to the end of the Calc AB syllabus. I have, like, 500K points there. And I've watched a whole lot of STEM on YT.

Yesterday I finished a lab with Moritz Klein's Voltage Controlled Oscillators, where I was able to successfully understand the function of all the sections in the circuit.

I've been trying to follow Aaron Lanterman's Georgia Tech lectures on analog electronics.

The issue is that I have other stuff going on in my life. Like, my son studies more than I work at my full time job.

And I don't really have the pressure on me to learn the more advanced math that he's using. In fact, in the couple of years since he graduated HS, I've not really found a use for calc in my day-to-day work on any of the technical things I've done (mostly programming) and so I've lost a lot of it.

So, by contrast, my son who will be graduating as a BS in ME in May, has a far better and deeper understanding of the engineering material than I do.

And it's not just a time issue- I quit my programming job last summer because I have just enough work as a musician to pay the rent, which leaves me plenty of time to do stuff. And it's not that I don't know how to learn at a college level- I taught in an English Dept for 8 years and quit a PhD in the humities ABD.

That's all just my experience.

I love STEM (and trades education) material on Youtube, but I really think that it's missing something to think that you could get " a better undergraduate education in STEM on youtube".

jeremiahbuckley · on Feb 14, 2024

Similar experiences, but different conclusions.

1. With advanced math I feel I retain at the n-1 level. Unless I’m using it, it fades. That’s frustrating but I don’t think it’s the fault of the deliverer.

I do think working through problems has to be part of the practice, I’ve bought workbooks to have something to try to drive the knowledge into muscle memory. It still fades, but maybe not as much.

2. Calculus, in particular seems super unimportant to real life. Stats and Linear Algebra, somewhat similar in Math Level, seem much more applicable. I’m very happy to see Stats being offered in high school now as an alternative to Calculus. For Calculus, you almost need to learn 3-4 rules and someone says “trust me, just memorize these, don’t spend too much time on this.” And you would be able to live a happy productive life.

dustingetz · on Feb 14, 2024

I think it's important to separate the motivation pill from the content delivery. You can buy a motivation pill for cheaper than $160k or whatever a degree costs these days. And we get to compare the very best tryhard youtubers to the median lecturer who is grinding it out.

passion__desire · on Feb 14, 2024

This was the point I made earlier. Consider Richard Feynman lectures. Why didn't universities collectively took the decision to create pre-made/cooked lecture videos for topics that don't change and show these videos during normal lecture which otherwise would be the job of professor to revise / prepare the topics the night before and deliver. The professor spends so much time in doing the same thing again and again everyyear. This would have freed them to have more discussion, office hours and so on.

coolThingsFirst · on Feb 14, 2024

You can’t sorry.

passion__desire · on Feb 15, 2024

Actually there is a tortoise and hare race going on. Entertainment is outpacing education. Education is getting better and better with modern technology but so also is distraction i.e. entertainment.

mitthrowaway2 · on Feb 14, 2024

I think good teachers make great researchers, because they have to understand their field very well, they anticipate and ask themselves the questions that need to be asked, they manage to always see their field with fresh eyes, they are good collaborators, and most importantly, good communicators.

tugberkk · on Feb 14, 2024

If they are teaching the specific research topic, yes. Otherwise, you need to come up with 14-week course material for different courses.

coolThingsFirst · on Feb 14, 2024

My question is this, great educators like Karpathy make things from 'scratch' and explain in a way that I can understand. Is it a matter of the instructor ability to do this or it's a matter of the student(i.e. me) not having enough chops to understand material from elsewhere?

somethingsome · on Feb 14, 2024

It's actually both!

A teacher can usually adapt the content depending on its audience, I would not teach the research in my field at the same level to professionals, PhDs, master students, bachelor students, amateurs, or even school students.

If what I'm teaching is fairly complex, it requires a lot of background that I could teach, but I would not have the time to do so, because it would be to the detriment of other students. So, while I usually teach 'from scratch', depending on my audience I will obfuscate some details (that I can answer separately if a question is asked) and usually I will dramatically change the speed of the lessons depending on the previous background, because I need to assume that the student has the prerequisite background to understand at that speed fairly complex material.

As an example, I gave some explanations to a student from zero to transformers, it took several hours with lots of questions, the same presentation to a teacher not in the field took me 1h30 and to a PhD in a related field took 25 minutes, the content was exactly the same, and it was from scratch, but the background in the audience was fairly different.

elbear · on Feb 14, 2024

At the same time, if you can explain something by using analogies to real-world things, to systems most of us have an intuition for, then you can target many more people at the same time. It's true that this is harder, because you have to find patterns that are common between these systems and also make it clear where the analogy ends. But the benefit to finding these common patterns is that you also understand them deeper.

To give a relevant example, graph theory concepts can be found both in so many real-world systems but also in programming languages and computer systems.

trogdor · on Feb 15, 2024

> it probably screws with a person to have their tweets get b-lined to the front page of hackernews

Just a friendly heads-up, it’s “bee-lined.”

I normally wouldn’t point that out, but “b-lined” could be read to suggest the opposite of your intention; a lower priority, a la “B-list celebrity.”

mcbishop · on Feb 14, 2024

The Lex Fridman episode with Andrej was an awesome education. Things explained so clearly.

aantix · on Feb 14, 2024

He should start a Patreon account.

chpatrick · on Feb 14, 2024

I don't imagine he's short on cash...

KerrAvon · on Feb 14, 2024

b-lined?

squigz · on Feb 14, 2024

https://en.wikipedia.org/wiki/Bee_line

spicyusername · on Feb 14, 2024

    He should teach more and take it seriously

Then he can go from being in the top .1% of income earners to the bottom .1%!

/s

skepticATX · on Feb 14, 2024

Frankly, OpenAI seems to be losing its luster, and fast.

Plugins were a failure. GPTs are a little better, but I still don't see the product market fit. GPT-4 is still king, but not by that much any more. It's not even clear that they're doing great research, because they don't publish.

GPT-5 has to be incredibly good at this point, and I'm not sure that it will be.

al_borland · on Feb 14, 2024

I know things keep moving faster and faster, especially in this space, but GPT-4 is less than a year old. Claiming they are losing their luster, because they aren’t shaking the earth with new models every quarter, seems a little ridiculous.

As the popularity has exploded, and ethical questions have become increasingly relevant, it is probably worth taking some time to nail certain aspects down before releasing everything to the public for the sake of being first.

phreeza · on Feb 14, 2024

Given how fast the valuation of the company and the scope of their ambition (e.g. raising a trillion dollars for chip manufacturing) has been hyped up, I think it's fair to say "You live by the hype, you die by the hype."

hef19898 · on Feb 14, 2024

Just time your exit correctly!

devoutsalsa · on Feb 14, 2024

"This year I invested in pumpkins. They've been going up the whole month of October, and I've got a feeling they're going to peak right around January and BANG! That's when I'll cash in!" -Homer Simpson

hef19898 · on Feb 14, 2024

Homer obviously was smart, a nuclear scientist, car developer and Junior Vice President in his own tech start-up! So he should know!

Edit: I forgot, NASA trained astronaut!

vonjuice · on Feb 14, 2024

RIP vim users

bamboozled · on Feb 14, 2024

Beautifully said.

bayindirh · on Feb 14, 2024

You don't lose your luster only by not innovating.

Altman saga, allowing military use and other small things step by step tarnish your reputation and pushes you to the mediocrity or worse.

Microsoft has many great development stories (read Raymond Chen's blog to be awed), but what they did at the end to other competitors and how they behave removed their luster, permanently for some people.

pixl97 · on Feb 14, 2024

At the end of the day the US.mil is spending billions to trillions of dollars. I'm not exactly sure what you mean by lose your luster, but becoming part of the military industrial complex is generally a way to bury yourself in deep piles of gold.

throw_pm23 · on Feb 14, 2024

I think you answered it yourself. The main way from cool to not cool is to be buried in "piles of gold".

ignoramous · on Feb 14, 2024

> a way to bury yourself in deep piles of gold

Unfortunately, no deep piles of gold without deep piles of corpses. It is inevitable, though. Prompted by the US military, other countries have also always pioneered or acquired advance tech, and I don't see why AI would be any different: Never send a human to do a machine's job is as ominous now as it is dystopian as machines increasingly become more human-like.

mring33621 · on Feb 14, 2024

There will always be corpses.

Do you want American corpses? Or somebody elses?

inglor_cz · on Feb 14, 2024

"allowing military use"

That would actually increase their standing in my eyes.

Not too far from where I live, Russian bombing is destroying homes of people whose language is similar to mine and whose "fault" is that they don't want to submit to rule from Moscow, direct or indirect.

If OpenAI can somehow help stop that, I am all for it.

bayindirh · on Feb 14, 2024

On the other hand, Israel is using AI to generate their bombing targets and pound Gaza strip with bombs non-stop [0].

And, according to UN, Turkey has used AI powered, autonomous littering drones to hit military convoys in Libya [1].

Regardless of us vs. them, AI shouldn't be a part of warfare, IMHO.

[0]: https://www.theguardian.com/world/2023/dec/01/the-gospel-how...

[1]: https://www.voanews.com/a/africa_possible-first-use-ai-armed...

kj99 · on Feb 14, 2024

> AI shouldn't be a part of warfare, IMHO.

Nor should nuclear weapons, guns, knives, or cudgels.

But we don’t have a way to stop them being used.

fwip · on Feb 14, 2024

Sure we do. We enforce it through the threat of warfare and subsequent prosecution, the same way we enforce the bans on chemical weapons and other war crimes.

We may lack the motivation and agreement to ban particular methods of warfare, but the means to enforce that ban exists, and drastically reduces their use.

inglor_cz · on Feb 14, 2024

"We enforce it through the threat of warfare and subsequent prosecution, the same way we enforce the bans on chemical weapons and other war crimes."

Do we, though? Sometimes, against smaller misbehaving players. Note that it doesn't necessarily stop them (Iran, North Korea), even though it makes their international position somewhat complicated.

Against the big players (the US, Russia, China), "threat of warfare and prosecution" does not really work to enforce anything. Russia rains death on Ukrainian cities every night, or attempts to do so while being stopped by AA. Meanwhile, Russian oil and gas are still being traded, including in EU.

kj99 · on Feb 14, 2024

We lack the motivation precisely because of information warfare that is already being used.

foolofat00k · on Feb 14, 2024

This is literally the only thing that matters in this debate. Everything else is useless hand-wringing from people who don't want to be associated with the negative externalities of their work.

The second that this tech was developed it became literally impossible to stop this from happening. It was a totally foreseeable consequence, but the researchers involved didn't care because they wanted to be successful and figured they could just try to blame others for the consequences of their actions.

qeternity · on Feb 14, 2024

> the researchers involved didn't care because they wanted to be successful and figured they could just try to blame others for the consequences of their actions

Such an absurdly reductive take. Or how about just like nuclear energy and knives, they are incredibly useful, society advancing tools that can also be used to cause harm. It's not as if AI can only be used for warfare. And like pretty much every technology, it ends up being used 99.9% for good, and 0.1% for evil.

foolofat00k · on Feb 14, 2024

I think you're missing the point. I don't think we should have prevented the development of this tech. It's just absurd to complain about things that we always knew would happen as though they're some sort of great surprise.

If we cared about preventing LLMs from being used for violence, we would have poured more than a tiny fraction our resources into safety/alignment research. We did not. Ergo, we don't care, we just want people to think we care.

I don't have any real issue with using LLMs for military purposes. It was always going to happen.

kj99 · on Feb 14, 2024

You say ‘we’ as if everyone is the same. Some people care, some people don’t. It only takes a a few who don’t, or who feel the ends justify the means. Because those people exist, the people who do care are forced into a prisoners dilemma forcing them to develop the technology anyway.

kelipso · on Feb 14, 2024

Safe or alignment research isn't going to stop it from being used for military purposes. Once the tech is out there, it will be used for military purposes; there's just no getting around it.

sambull · on Feb 14, 2024

If it ever happens again, they'll develop the lists in seconds from data collected from our social media, intercept. What took organizations warehouses and thousands of agents will be done in a matter of seconds.

IncreasePosts · on Feb 14, 2024

Why not? Maybe AI is what is needed to finally tear Hamas out of Palestine root and branch. As long as humans are still in the loop vetting the potential targets, it doesn't seem particularly different from the IDF just hiring a bunch of analysts to produce the same targets.

throwboatyface · on Feb 14, 2024

There is no "removing Hamas from Palestine". The only way to remove the desire of the Palestinian people for freedom is to remove the Palestinian people themselves. And that is what the IDF is trying to do.

IncreasePosts · on Feb 14, 2024

Hamas isn't the only path to freedom for Palestinians. In fact, they seem to be the major impediment to it.

lolc · on Feb 14, 2024

If we're going to be reductive, at least include the other main roadblock to a solution which is the current government of Israel.

IncreasePosts · on Feb 14, 2024

That doesn't explain why deals weren't reached with the previous governments of Israel.

lolc · on Feb 14, 2024

Sure it doesn't explain that. Would be nice if things were that easy wouldn't it?

IncreasePosts · on Feb 15, 2024

Generally if a main roadblock is removed, you can get a little farther down the road.

lolc · on Feb 15, 2024

Hamas doesn't exist in a vacuum where you can just remove it and then it's gone. You have to offer a life that's better than Hamas.

g8oz · on Feb 14, 2024

Considering the incredible amount of civilian casualties, I don't think the target vetting is working very well.

dizhn · on Feb 14, 2024

I would be very surprised if Turkey was capable of doing that. If they did, that's all Erdoğan would be talking about. Also it's a bit weird that the linked article's source is a Turkish name. (Economy and theology major too)

I am not saying this is anything but it's definetely tingling my "something's up" senses.

bayindirh · on Feb 14, 2024

Voice of America generally employs country's nationals for their reporting. There are some other resources:

    - NPR: https://www.npr.org/2021/06/01/1002196245/a-u-n-report-suggests-libya-saw-the-first-battlefield-killing-by-an-autonomous-d
    - Lieber Institute: https://lieber.westpoint.edu/kargu-2-autonomous-attack-drone-legal-ethical/
    - ICRC: https://casebook.icrc.org/case-study/libya-use-lethal-autonomous-weapon-systems
    - UN report itself (Search for Kargu): https://undocs.org/Home/Mobile?FinalSymbol=S%2F2021%2F229&Language=E&DeviceType=Desktop&LangRequested=False
    - Kargu itself: https://www.stm.com.tr/en/kargu-autonomous-tactical-multi-rotor-attack-uav

From my experience, Turkish military doesn't like to talk about all the things they have.

dizhn · on Feb 14, 2024

The major drone manufacturer is Erdoğan's son-in-law. He's being groomed as one of his possible sucessors on the throne. They looove to talk about those drones.

I will check out the links. Thanks a lot.

bayindirh · on Feb 14, 2024

You're welcome.

The drones in question (Kargu) are not built by his company.

dizhn · on Feb 14, 2024

True. I had been reading about how other drones are in service but they never get mentioned anymore.

WhrRTheBaboons · on Feb 14, 2024

>If OpenAI can somehow help stop that, I am all for it.

I got some bad news for you then.

stcroixx · on Feb 14, 2024

Agreed. It's the most important and impactful use case. All else are a set of parlor tricks in comparison.

ronhav3 · on Feb 14, 2024

Yep. AI is, and will be used militarily.

These virtue signaling games are childish.

berniedurfee · on Feb 14, 2024

It is indeed tragic that virtue is a childish trait among adults.

inglor_cz · on Feb 14, 2024

That assumes that being a pacifist when living under the umbrella of the most powerful military in the world is, in fact, a virtue.

I don't think so. In order to be virtuous, one should have some skin in the game. I would respect dedicated pacifists in Kyiv a lot more. I wouldn't agree with them, but at least they would be ready to face pretty stark consequences of their philosophical belief.

Living in the Silicon Valley and proclaiming yourself virtuous pacifist comes at negligible personal cost.

vonjuice · on Feb 14, 2024

That's kind of like saying that not being a murderer only has moral value if you're constantly under mortal threat yourself.

inglor_cz · on Feb 14, 2024

I don't really see the comparison. Not being a murderer isn't a virtue, it is just normal behavior for 99,9 per cent of the population.

vonjuice · on Feb 15, 2024

First of all no one declared themselves a virtuous pacifist.

People don't participate in murder and they think others shouldn't either.

People don't participate in wars (which are essentially large scale murder) and they think others shouldn't.

Murder happens anyway. War happens anyway.

Yet if someone says 'war bad' people jump and say 'virtue signaling', but no one does that when people say 'murder bad'.

There's some really weird moral entanglement happening in the minds of people that are so eager to call out virtue signaling.

CuriouslyC · on Feb 14, 2024

Virtue isn't childish, shooting telegraphed signals to be perceived as virtuous regardless of your true nature is childish. Also, using a one dimensional, stereotypical storybook definition of virtue (and then trying to foist that on others) is also childish.

denverllc · on Feb 14, 2024

I don’t think a lot of companies care whether they lose their luster to techies since corporations and most individuals will still buy their product. MSFT was $12 in 2000 (when they had their antitrust lawsuit) and is $400 now.

optymizer · on Feb 14, 2024

I never bought into ethical questions. It's trained on publicly available data as far as I understand. What's the most unethical thing it can do?

My experience is limited. I got it to berate me with a jailbreak. I asked it to do so, so the onus is on me to be able to handle the response.

I'm trying to think of unethical things it can do that are not in the realm of "you asked it for that information, just as you would have searched on Google", but I can only think of things like "how to make a bomb", suicide related instructions, etc which I would place in the "sharp knife" category. One has to be able to handle it before using it.

It's been increasingly giving the canned "As an AI language model ..." response for stuff that's not even unethical, just dicey, for example.

al_borland · on Feb 14, 2024

One recent example in the news was the AI generated p*rn of Taylor Swift. From what I read, the people who made it used Bing, which is based on OpenAI’s tech.

lobocinza · on Feb 14, 2024

This is more sensationalism than ethical issue. Whatever they did they could do, and probably do better, using publicly available tools like Stable Diffusion.

majora2007 · on Feb 14, 2024

or just photoshop. The only thing these tools did was make it easier. I don't think the AI aspect adds anything for this comparison.

Anon84 · on Feb 14, 2024

An argument can be made that "more is different." By making it easier to do something, you're increasing the supply, possibly even taking something that used to be a rare edge case and making it a common occurrence, which can pose problems in and of itself.

lobocinza · on Feb 20, 2024

It's more dangerous if it's uncommon. It's knowledge that protects people and not a bunch of annoying "AI safety" "researchers" selling the lie that "AI is safe". Truth is those morons only have a job because they help companies save face and create a moat around this new technology where new competitors will be required to have "AI safety" teams & solutions. What have "AI safety" achieved so far besides making models dumber and annoying to use?

stickfigure · on Feb 14, 2024

Put in a different context: The exploits are out there. Are you saying we shouldn't publish them?

Deepfakes are going to become a concern of everyday life whether you stop OpenAI from generating them or not. The cat is out of the proverbial bag. We as a society need to adjust to treating this sort of content skeptically, and I see no more appropriate way than letting a bunch of fake celebrity porn circulate.

What scares me about deepfakes is not the porn, it's the scams. The scams can actually destroy lives. We need to start ratcheting up social skepticism asap.

vonjuice · on Feb 14, 2024

You probably don't care about the porn cause I'm assuming you're a man, but it can ruin lives too.

stickfigure · on Feb 15, 2024

It can only ruin lives if people believe it's real. Until recently, that was a reasonable belief; now it's not. People will catch on and society will adapt.

It's not like the technology is going to disappear.

vonjuice · on Feb 15, 2024

I mean, the same applies to scams, scams only work if people believe them.

stickfigure · on Feb 15, 2024

Right - as I said, we need to ramp up social skepticism, fast. Not as in some kind of utopian vision, but "the amount of fake information will be moving from a trickle to a flood soon, there's nothing you can do about that, so brace yourselves".

The specific policies of OpenAI or Google or whatnot are irrelevant. The technology is out of the bag.

zingelshuher · on Feb 14, 2024

You are talking like it's something bad. Kids are learning AI and computing instead of drugs and guns. And nobody is hurt.

onlyrealcuzzo · on Feb 14, 2024

> Claiming they are losing their luster, because they aren’t shaking the earth with new models every quarter, seems a little ridiculous.

If that's the foundation your luster is built on - then it's not really ridiculous.

GPT popularized LLMs to the world with GPT-3, not too long before GPT-4 came out. They made a lot of big, cool changes shortly after GPT-4 - and everyone in their mother announced LLM projects and integrations in that time.

It's been about 9 months now, and not a whole lot has happened in the space.

It's almost as if the law of diminishing returns has kicked in.

famouswaffles · on Feb 14, 2024

GPT-3 came out 3 years before 4.

onlyrealcuzzo · on Feb 14, 2024

GPT-3.5 is when LLMs start to get "main stream". That's about 4.5 months before the GPT-4 release.

Keep in mind GPT-3.5 is not an overnight craze. It takes months before normal people even know what it is.

famouswaffles · on Feb 14, 2024

>GPT-3.5 is when LLMs start to get "main stream".

To the general public sure but not research which is what produces the models.

The idea that diminishing returns has hit because there hasn't been a new SOTA model in 9 months is ridiculous. Models take months just to train. Open AI sat on 4 for over half a year after training was done just red-teaming it.

l33tman · on Feb 14, 2024

It sure is, but the theme in the sub-thread was about if OAI in particular can afford to do that (i.e. wait) while there are literally dozens of other companies and open-source projects showing they can solve a lot of the tasks GPT-4 does, for free, so that the OAI value proposition seems weaker and weaker by the month.

Add to that a company environment that seems to be built on money-crazed stock option piling engineers and a CEO that seems to have gotten power-crazed.. I mean they grew far too fast I guess..

AnimalMuppet · on Feb 14, 2024

Perhaps GPT-4 is losing its luster because the more people actually use it, they go from "wow that's amazing" to "amazing, yes, but..."? And the "but" looms larger and larger with more time and more exposure?

Note well: I haven't actually used it myself, so I'm speculating (guessing) rather than saying that this is how it is.

chasd00 · on Feb 14, 2024

i got a feeling this is beginning to happen all over the place, I'm really curious to see where the hype train ends up at the end of this year.

NBJack · on Feb 14, 2024

This space is growing by leaps and bounds. It's not so much the passage of time as it is the number of notable advancements that is dictating the pace.

sho · on Feb 14, 2024

> GPT-4 is still king, but not by that much any more

Idk, I just tried Gemini Ultra and it's so much worse than GPT4 that I am actually quite shocked. Trying to ask it any kind of coding question ends up being this frustrating and honestly bizarre waste of time as it hallucinates a whole new language syntax every time and then asks if you want to continue with non-working, in fact non-existing, option A or the equally non-existent option B until you realise that you've spent an hour trying to make it at least output something that is even in the requested language and finally that it is completely useless.

I'm actually pretty astonished at how far Google is behind and that they released such a bunch of worthless junk at all. And have the chutzpah to ask people to pay for it!

Of course I'm looking forward to gpt-5 but even if it's only a minor step up, they're still way ahead.

mad_tortoise · on Feb 14, 2024

That's interesting, because I have had exactly the opposite experience testing GPT vs Bard with coding questions. Bard/Gemini far outperformed GPT on coding, especially with newer languages or libraries. Whereas GPT was better with more general questions.

dieortin · on Feb 14, 2024

I’ve had the opposite experience with Gemini, which was surprising. I feel like it lies less to me among other things

Keyframe · on Feb 14, 2024

I kind of gave up completely on coding questions. Whether it's GPT4, Anthropic, or Gemini - there's always this big issue of laziness I'm facing. Never do I get a full code, there are always stubs or TODOs (on important stuff) and when asked to correct for that.. I just get more of it (laziness). Has anyone else faced this and is there a solution? It's almost as annoying, if not more, as was incomplete output in the early days.

bugglebeetle · on Feb 14, 2024

The solution, at least for GPT-4, is to ask it to first draft a software spec for whatever you want it to implement and then write the code based on the spec. There are a bunch of examples here:

https://github.com/mckaywrigley/prompts

CuriouslyC · on Feb 14, 2024

If you can't get GPT4 to do coding questions you're prompting it wrong or not loading your context correctly. It struggles a bit with presentational stuff like getting correct HTML/CSS from prompts or trying to generate/update large functions/classes, but it is stellar at producing short functions, creating scaffolding (tests/stories) and boilerplate and it can do some refactors that are outside the capabilities of analytical tools, such as converting from inline styles to tailwind, for example.

Keyframe · on Feb 14, 2024

so, mundane trivial things and/like web programming? I got it eventually to answer what I needed but it always liked to skip part of the code, inserting // TODO: important stuff in the middle, hence 'laziness' attribute. Maybe it is just lazy, who knows. I know I am since I'm prompting it for stuff.

CuriouslyC · on Feb 14, 2024

I wouldn't say mundane/trivial as much as well trodden. I get good code for basic shaders, various compsci algorithms, common straightforward sql queries, etc. If you're asking for it to edit 500 line functions and handle memory management in a language that isn't in the top20 of the TIOBE index you're going to have a bad time.

The todo comments can be prompted against, just tell it to always include complete runnable code as its output will executed in a sandbox without prior verification.

antonvs · on Feb 19, 2024

Fyi, I've never encountered what you're describing, whether with GPT 3.5 or 4.

It may be that you're expecting it to do too much at once. Try giving smaller requests.

TeMPOraL · on Feb 14, 2024

They seem to be steadily dumbing down GPT-4; eventually, improving performance of open source models and decreasing performance of GPT-4 will meet in the middle.

bamboozled · on Feb 14, 2024

I'm almost certain this is because you're getting use to chat bots. How would they honestly be getting worse?

Initially it felt like the singularity was at hand. You've played with it, got to know it, the computer was taking to you, it was your friend, it was exciting then you got bored with your new friend and it wasn't as great as you remember it.

Dating is often like this. You meet someone, have some amazing intimacy, then you get really get to know someone, you work out it wasn't for you and it's time to move on.

TeMPOraL · on Feb 14, 2024

The author of `aider` - an OSS GPT-powered coding assistant - is on HN, and says[0] he has benchmarks showing gradual decline in quality of GPT-4-Turbo, especially wrt. "lazy coding" - i.e. actually completing a coding request, vs. peppering it with " ... write this yourself ... " comments.

That on top of my own experiences, and heaps of anecdotes over the last year.

> How would they honestly be getting worse?

The models behind GPT-4 (which is rumored to be a mixture model)? Tuning, RLHF (which has long been demonstrated to dumb the model down). The GPT-4, as in the thing that produces responses you get through API? Caching, load-balancing, whatever other tricks they do to keep the costs down and availability up, to cope with the growth of the number of requests.

--

[0] - https://news.ycombinator.com/item?id=39361705

DJHenk · on Feb 14, 2024

> I'm almost certain this is because you're getting use to chat bots. How would they honestly be getting worse?

People say that, but I don't get this line of reasoning. There was something new, I learned to work with it. At one point I knew what question to ask to get the answer I want and have been using that form ever since.

Nowadays I don't get the answer I want for the same input. How is that not a result of declining quality?

jsjohnst · on Feb 14, 2024

For the record, I agree with you about declining quality of answers, but…

> Nowadays I don't get the answer I want for the same input. How is that not a result of declining quality?

Is it really the same input? An argument could easily be made that as you’ve gotten accustomed to ChatGPT, you ask harder questions, use less descriptive of language, etc.

DJHenk · on Feb 14, 2024

> Is it really the same input? An argument could easily be made that as you’ve gotten accustomed to ChatGPT, you ask harder questions, use less descriptive of language, etc.

I don't have logs detailed enough to be able to look it up, so I can't prove it. But for me learning to work with AI tools like ChatGPT consists specifically developing an intuition of what kind of answer to expect.

Maybe my intuition skewed a little over the months. It did not do that for open source models though. As a software developer understanding and knowing what to expect from a complex system is basically my profession. Not just the systems I build, maintain and integrate, but also the systems I use to get information, like search engines. Prompt engineering is just a new iteration of google-fu.

Since this intuition has not failed me in all those other areas and since OpenAI has an incentive to change the workings under the hood (cutting costs, adding barriers to keep it politically correct) and it is a closed source system that no-one from the outside can inspect, my bet is that it is them and not me.

jsjohnst · on Feb 15, 2024

> As a software developer understanding and knowing what to expect from a complex system is basically my profession. Not just the systems I build, maintain and integrate, but also the systems I use to get information, like search engines.

Ok, I’m going to call b/s here unless your expectations of Google have not gone way down over the years. Google was night and day different results twenty years ago vs ten years ago vs today. If 2004 Google search was a “10 out of 10”, then 2014 it was an “8 out of 10”, and today barely breaks a “5” in quality of results in comparison and don’t even bother with the advanced query syntax you could’ve used in the 00’s, they flat ignore it now.

(Also, side note, reread what you said in this post again. Just a friendly note that the overall tone comes across a certain way you might not have intended)

avion23 · on Feb 14, 2024

Not OP, but I copy & pasted the same code and asked it to improve. With no-fingers-tip-hack it does something, but much worse results.

jsjohnst · on Feb 15, 2024

Yep, hence why I said up front “I agree with you about declining quality of answers” because they definitively have based on personal experience with examples similar to yours.

omega3 · on Feb 14, 2024

Could you share your findings re what questions to ask?

clbrmbr · on Feb 14, 2024

1. Cost & resource optimization

2. More and more RLHF

bamboozled · on Feb 14, 2024

So we should expected GPT-5 to be worse than GPT-4?

pixl97 · on Feb 14, 2024

GPT-5: "I'm sorry I cannot answer that question because it may make GPT-4 feel bad about it's mental capabilities, instead we've presented GPT-4 with a participation trophy and told it's a good model"

Talking to corporate HR is subjectively worse for most people, and objectively worse in many cases.

whywhywhywhy · on Feb 14, 2024

> How would they honestly be getting worse

To me it feels like it detects if the answer could be answered cheaper by code interpreter model or 4 Turbo and then it offloads them to that and they just kinda suck compared to OG 4.

I’ve watched it fumble and fail to solve a problem with CI, took it 3 attempts over 5 minutes real time and just gave up in the end, a problem that OG 4 can do one shot no preamble.

detourdog · on Feb 14, 2024

Google search got worse.

whywhywhywhy · on Feb 14, 2024

Yandex image search is now better than Googles just by being the exact product Googles was 10+ years ago.

Watching tools decline is frustrating.

polshaw · on Feb 14, 2024

And Amazon search, youtube search. There do seem to be somewhat different incentives involved though, those examples are primarily about increasingly pushing lower quality content (ads, more profitable items, more engaging items) because it makes more money.

detourdog · on Feb 14, 2024

The incentive mismatch that I seem to be observing is that Wall Street is in constant need of new technical disruption. This means that any product that shows promise will be optimized to meet a business plan rather than a human need.

fennecfoxy · on Feb 14, 2024

Yeah, I agree, GPT's attention seems much less focussed now. If you tell it to respond in a certain way it now has trouble figuring out what you want.

If it's a conversation with "format this loose data into XML" repeated several times and then a "now format it to JSON" I find often it has trouble determing that what you just asked for is the most important; I think the attention model gets confused by all the preceding text.

pb7 · on Feb 14, 2024

Do you have example links?

sho · on Feb 14, 2024

here was one of them https://gemini.google.com/share/fde31202b221?hl=en

edit: as pointed out, this was indeed a pretty esoteric example. But the rest of my attempts were hardly better, if they had a response at all.

peddling-brink · on Feb 14, 2024

That’s an awfully specific and esoteric question. Would you expect gpt4 to be significantly better at that level of depth? That’s not been my experience.

sho · on Feb 14, 2024

OK, i have to admit that one was a little odd, I was beginning to give up and trying new angles. I can't really share my other sessions. But I was trying to get a handle on the language and thought it would be an easily-understood situation (multiple-token auth). I would have at least expected the response to be slightly valid.

The language in question was only open sourced after GPT4's training date, so i couldn't compare. That's actually why I tried it in the first place. And yes, I do expect it to be better - GPT4 isn't perfect but I don't really it ever hallucinating quite that hard. In fact, its answer was basically that it didn't know.

And when I asked it questions with other, much less esoteric code like "how would you refactor this to be more idiomatic?" I'd get either "I couldn't complete your request. Rephrase your prompt and try again." or "Sorry, I can't help with that because there's too much data. Try again with less data." GPT-4 was helpful in both cases.

peddling-brink · on Feb 14, 2024

My experience has been that gpt4 will happily hallucinate the details when I go too deep. Like you mentioned, it will invent new syntax and function calls.

It's magic, until it isn't.

danielscrubs · on Feb 14, 2024

Googlers are wishing OpenAI could vanish as it makes them look like the IBM-lookalike they are.

Here are some hilarious highlights: https://twitter.com/Suhail/status/1757573182138290284

OJFord · on Feb 14, 2024

I've had plenty of dumb policy violation misfires like that with ChatGPT, and got banned from Bing (which uses OpenAI API, not GPT4 at the time I think) for it the day it launched.

lordswork · on Feb 14, 2024

IMO, these examples are a result of Google's AI safety team being overly conservative and overly simplistic in their approaches.

Google DeepMind is still an AI research powerhouse that is producing a ton of innovation both internal and publicly published.

roody15 · on Feb 14, 2024

Running Ollama with a 80gb mistral model works as well if not better than ChatGPT 3.5. This is a good thing for the world IMO as the magic is no longer held just OpenAI. The speed at which competitors have caught up in even the last 3 months is astounding.

huytersd · on Feb 14, 2024

But no one cares about 3.5. It’s an order of magnitude worse than 4. An order of magnitude is a lot harder to catch up with.

nl · on Feb 14, 2024

This isn't true. Lots of people care deeply and use 3.5 levels of performance at some point in their software stack.

For lots of applications the speed/quality/price trade offs make a lot of sense.

For example if you are doing vanilla question answering over lots of documents then 3.5 or Mixtral are better than GPT4 because the speed is important.

huytersd · on Feb 15, 2024

It’s a price issue because 3.5 and 4 response times are about the same for me.

epolanski · on Feb 14, 2024

That really depends on the use case.

For some advanced reasoning you're 100% right, but many times you're doing document conversion, summarizing, doing RAG, in all these cases GPT 3.5 performs as good if not better than GPT 4 (we can't ignore cost and speed) and it's very hard to distinguish between the two.

darkwater · on Feb 14, 2024

I would dare to say that in general most people need every day help on more simple tasks rather than complex reasoning. Now obviously, if you get complex reasoning at the same speed and cost of simpler tasks, it's a no-brainer. But if there are trade-offs...

danpalmer · on Feb 14, 2024

Many products don’t expose chat directly to the user. For example auto categorisation of my bank transactions does not need GPT-4, and small model with a little fine tuning will do well, and massively outperform any other classification. There are many problems like this.

sjwhevvvvvsj · on Feb 14, 2024

What Mistral has though is speed, and with speed comes scale.

spaceman_2020 · on Feb 14, 2024

Who cares about speed if you’re wrong?

This isn’t a race to write the most lines of code or the most lines of text. It’s a race to write the most correct lines of code.

I’ll wait half an hour for a response if I know I’m getting at least staff engineer level tier of code for every question

popinman322 · on Feb 14, 2024

For the tasks my group is considering, even a 7B model is adequate.

Sufficiently accurate responses can be fed into other systems downstream and cleaned up. Even code responses can benefit from this by restricting output tokens using the grammar of the target language, or iterating until the code compiles successfully.

And for a decent number of LLM-enabled use cases the functionality unlocked by these models is novel. When you're going from 0 to 1 people will just be amazed that the product exists.

sjwhevvvvvsj · on Feb 14, 2024

Who says it’s wrong? I have very discrete tasks which involve resolving linguistic ambiguity and they can perform very well.

mlnj · on Feb 14, 2024

Exactly. Not everything is throwing large chunks of text to get complex questions answered.

I love using the smaller models like Starling LM 7B and Mistral 7B have been enough for many tasks like you mentioned.

dathinab · on Feb 14, 2024

Who care about getting better answers if you can't afford it, can't use it for legal reason or conclude that the risk associated with OpenAI now being a fully proprietary US based service only company is to high given all circumstances. (Depending on how various things develop things like US export restricting OpenAI, even GPT-4, is a very real possibility companies can't ignore when doing long term product decisions.)

ein0p · on Feb 14, 2024

That’s the correct answer. Years ago I worked on inference efficiency on edge hardware at a startup. Time after time I saw that users vastly prefer slower, but more accurate and robust systems. Put succinctly: nobody cares how quick a model is if it doesn’t do a good job. Another thing I discovered is it can be very difficult to convince software engineers of this obvious fact.

Al-Khwarizmi · on Feb 14, 2024

Less compute also means lower cost, though.

I see how most people would prefer a better but slower model when price is equal, but I'm sure many prefer a worse $2/mo model over a better $20/mo model.

ein0p · on Feb 14, 2024

That’s the thing I’m finding so hard to explain. Nobody would ever pay even $2 for a system that is worse at solving the problem. There is some baseline compute you need to deliver certain types of models. Going below that level for lower cost at the expense of accuracy and robustness is a fool’s errand.

In LLMs it’s even worse. To make it concrete, for how I use LLMs I will not only not pay for anything with less capability than GPT4, I won’t even use it for free. It could be that other LLMs could perform well on narrow problems after fine tuning, but even then I’d prefer the model with the highest metrics, not the lowest inference cost.

sjwhevvvvvsj · on Feb 14, 2024

So I think that’s a “your problem isn’t right for the tool” issue, not a “Mistral isn’t capable” issue.

ein0p · on Feb 14, 2024

It isn’t capable unless you have a very specialized task and carefully fine tune to solve just that task. GPT4 covers a lot of ground out of the box. The best model I’ve seen so far on the FOSS side, Mixtral MoE, is less capable than even GPT 3.5. I often submit my requests to both Mixtral and GPT4. If I’m problem solving (learning something, working with code, summarizing, working on my messaging) Mixtral is nearly always a waste of time in comparison.

sjwhevvvvvsj · on Feb 14, 2024

Again, that’s precisely what I’m saying. A bounded task is best executed against the smallest possible model at the greatest possible speed. This is true for business factors ($$$) as well as environmental (smaller model -> less carbon).

LLM are not AGI, they are tools that have specific uses we are still discovering.

If you aren’t trying to optimize your accuracy to start with and just saying “I’ll run the most expensive thing and assume it is better” with zero evaluation you’re wasting money, time, and hurting the environment.

Also, I don’t even like running Mistral if I can avoid it - a lot of tasks can be done with a fine tune of BERT or DistilBERT. It takes more work but my custom BERT models way outperform GPT-4 on bounded tasks because I have highly curated training data.

Within specialized domains you just aren’t going to see GPT-4/5/6 performing on par with expert curated data.

spacecadet · on Feb 14, 2024

Having spent time on edge compute projects. This.

Also, all the evidence is in this thread. Clearly people unhappy with wasting time on LLMs, when the time that was wasted was the result of obviously bad output.

sjwhevvvvvsj · on Feb 14, 2024

People think LLM are all or nothing, like it’s either god-like AGI or it’s useless “hallucinating”.

In reality you have to know the strengths and weaknesses of any tool, and small/fast LLM can do a tremendous amount within a fixed scope. The people at Mistral get this.