More

mattw1810 · 2025-08-16T17:03:28 1755363808

On the whole GPT-4 to GPT-5 is clearly the smallest increase in lucidity/intelligence. They had pre-training figured out much better than post-training at that point though (“as an AI model” was a problem of their own making).

I imagine the GPT-4 base model might hold up pretty well on output quality if you’d post-train it with today’s data & techniques (without the architectural changes of 4o/5). Context size & price/performance maybe another story though

energy123 · 2025-08-17T02:35:03 1755398103

Basic prose is a saturated bench. You can't go above 100% so by definition progress will stall on such benchmarks.

mattw1810 · 2025-08-17T07:05:15 1755414315

All the same they choose to highlight basic prose (and internal knowledge, for that matter) in their marketing material.

They’ve achieved a lot to make recent models more reliable as a building block & more capable of things like math, but for LLMs, saturating prose is to a degree equivalent to saturating usefulness.

jstummbillig · 2025-08-16T22:13:15 1755382395

> On the whole GPT-4 to GPT-5 is clearly the smallest increase in lucidity/intelligence

I think it's far more likely that we increasingly not capable of understanding/appreciating all the ways in which it's better.

achierius · 2025-08-16T23:42:50 1755387770

Why? It sounds like you're using "I believe it's rapidly getting smarter" as evidence for "so it's getting smarter in ways we don't understand", but I'd expect the causality to go the other way around.

jstummbillig · 2025-08-17T07:25:29 1755415529

Simply because of what we know about our ability to judge capabilities and systems. It's much harder to judge solutions to hard problems. You can demonstrate that you can add 2+2, and anyone* can be the judge of that ability, but if you try to convince anyone of a mathematical proof you came up with, that would be a much harder thing to do, regardless of your capability to write that prove and how hard it was to write the proof.

The more complicated and/or complex things become, the less likely it is that a human can act as a reliable judge. At some point no human can.

So while it could definitely be the case that AI progress is slowing down (AI labs seem to not think so, but alas), what is absolutely certain is that our ability to appreciate any such progress is diminishing already, because we know that this is generally true.

brabel · 2025-08-17T08:02:32 1755417752

This thread shows that. People are saying gpt-1 was the best at writing poetry. I wonder how good they are at judging poetry themselves. I saw a blind study where people thought a story written by gpt5 was better than an actual human bestseller. I assume they were actual experts but I would need to check that.

dgfitz · 2025-08-17T15:33:48 1755444828

> The more complicated and/or complex things become, the less likely it is that a human can act as a reliable judge. At some point no human can.

Give me an example, please. I can't come up with something that started simple and became too complex for humans to "judge". I am quite curious.

jstummbillig · 2025-08-17T16:56:08 1755449768

I did not mean "become" in the sense of "evolve" but as in "later on an imagined continuum contained all things, that goes from simple/easy to complex/complicated" (but I can see how that was ambiguous)

mattw1810 · 2025-05-10T20:57:15 1746910635

MCP should just have been stateless HTTP to begin with. There is no good reason for almost any of the servers I have seen to be stateful at the request/session level —- either the server carries the state globally or it works fine with a session identifier of some sort.

taocoyote · 2025-05-10T21:13:47 1746911627

I don't understand the logistics of MCP interactions. Can anyone explain why they aren't stateless. Why does a connection need to be held open?

mattw1810 · 2025-05-10T21:18:07 1746911887

I think some of the advanced features around sampling from the calling LLM could theoretically benefit from a bidirectional stream.

In practice, nobody uses those parts of the protocol (it was overdesigned and hardly any clients support it). The key thing MCP brings right now is a standardized way to discover & invoke tools. This would’ve worked equally well as a plain HTTP-based protocol (certainly for a v1) and it’d have made it 10x easier to implement.

brumar · 2025-05-11T07:14:57 1746947697

Sampling is to my eyes a very promising aspect of the protocol. Maybe its implementation is lagging behind because it's too far from the previous mental model of tool use. I am also fine if the burden is on the client side if it enables a good DX on server side. In practice, there would be much more servers than clients.

brabel · 2025-05-11T07:40:18 1746949218

> This would’ve worked equally well as a plain HTTP-based protocol

With plain HTTP you can quite easily "stream" both the request's and the response's body: that's a HTTP/1 feature called "chunking" (the message body is not just one byte array, it's "chunked" so that each chunk can be received in sequence). I really don't get why people think you need WS (or ffs SSE) for "streaming". I've implemented a chat using just good old HTTP/1.1 with chunking. It's actually a perfect use case, so it suits LLMs quite well.

0x457 · 2025-05-11T01:17:55 1746926275

Well, the point is to provide context, it's easier to do if server has state.

For example, you have a MCP client (let's say it's amazon q cli), a you have a MCP server for executing commands over ssh. If connection is maintained between MCP client and server, then MCP server can keep ssh connection alive.

Replace SSH server with anything else that has state - a browser for example (now your AI assistant also can have 500 open tabs)

lo0dot0 · 2025-05-10T21:24:51 1746912291

I don't claim to have a lot of experience on this but my intuition tells me that a connection that ends after the request needs to be reopened for the next request. What is more efficient, keeping the session open or closing it, depends on the usage pattern, how much memory does the session consume, etc. etc.

mattw1810 · 2025-05-10T21:29:06 1746912546

This is no different from a web app though, there’s no obvious need to reinvent the wheel. We know how to do this very very well: the underlying TCP connection remains active, we multiplex requests, and cookies bridge the gap for multi-request context. Every language has great client & server support for that.

Instead we ended up with a protocol that fights with load balancers and can in most cases not just be chucked into say an existing Express/FastAPI app.

That makes everything harder (& cynically, it creates room for providers like Cloudflare to create black box tooling & advertise it as _the_ way to deploy a remote MCP server)

ycombinatrix · 2025-05-11T05:21:16 1746940876

That's not "stateful" for the purposes of correctness. Reusing a tcp stream doesn't make a protocol stateful.

mattw1810 · 2025-05-04T09:12:08 1746349928

Their patchy JSON schema support for tool calls & structured generation is also very annoying… things like unions that you’d think are table stakes (and in fact work fine with both OpenAI and Anthropic) get rejected & you have to go reengineer your entire setup to accommodate it.

mattw1810 · on May 23, 2024

Funnily enough I recently noticed that the X app on iOS started doing this for me… on ads. If I place my finger on an ad while scrolling down, without fault it opens the ad overlay sheet. I guess that’s one way to increase CPC revenue

mattw1810 · on Nov 17, 2021

The Rust implementation has it as an experimental extension (https://github.com/facebookexperimental/starlark-rust/blob/m...)

throwaway894345 · on Nov 17, 2021

Oooh, very cool! I'll check it out and maybe port to Go.

mattw1810 · on Dec 6, 2018

To be fair, in my experience ticket checking in Luxembourg was (at least in the city buses) so incredibly lax already that not a lot of revenue will be lost by this. All the same it's a nice policy though.

mattw1810 · on Aug 9, 2016

https://wiki.lesswrong.com/wiki/Paperclip_maximizer

mattw1810 · on July 31, 2014

Sent you 1000 STR. My id: mattw1810

alec_heif · on July 31, 2014

Sent!

My name: alec.heif

gecko39 · on July 31, 2014

sent you 1k, username: pay

mattw1810 · on May 21, 2014

I think that this is at least partially attributable to the average understanding of English in those countries. When you are watching a movie with subtitles, but have absolutely no idea what is being said in the movie because the spoken language is unknown to you, it can be very distracting, because you are missing a certain amount of context. I believe an above-average amount of Dutch people is at least capable of comprehending basic English speech, which puts the movie's subtitles into context. In countries where English isn't as wide-spread, this is more problematic, which is why dubbing is more effective there. Similarly, even in the Netherlands movies which target family audiences are often shown dubbed by default in movie theatres, because the subtitles are just not as effective even if the children that are watching the movie can read them.

riffraff · on May 21, 2014

I'd guess it's the opposite: average understanding of english is lower _because_ the content is all translated.

Also, I do not know about germany/france/spain, but italian movies have basically always been dubbed, in the sense that italian actors were dubbed by other italian actors.

mattw1810 · on May 2, 2014

And every single time I see an article about Jobs accompanied by a set of comments from a tech-savvy audience I see this same comment about Ritchie (and sometimes McCarthy) resurfacing. You can continue feeling sour about it, but not everyone deserving attention receives as much as they should, and not everyone receiving attention deserves it as much as they get. Ritchie's and McCarthy's personalities and accomplishments just aren't as interesting to the general public as Jobs', which has little to do with our definition of success but more with the fact that, to be able to obsess over someone, we need them and their work to speak to our imagination, which is a lot harder when their accomplishments are less trivial to understand without any knowledge on their field of study.

star-trek-fleet · on May 2, 2014

General public's tastes and interests can be shifted. Many years ago, Chinese people value hardwork and intelligence. Now they admire same thing as Americans smartness and fame.

mattw1810 · on May 2, 2014

While that may be true, it most certainly hasn't happened in the last 3 years. Maybe over time people will stop valuing Jobs' work higher than that of Ritchie and McCarthy, but I think it is likely to stay like this for the forseeable future, just as it has been for quite a while.