On the whole GPT-4 to GPT-5 is clearly the smallest increase in lucidity/intelligence. They had pre-training figured out much better than post-training at that point though (“as an AI model” was a problem of their own making).
I imagine the GPT-4 base model might hold up pretty well on output quality if you’d post-train it with today’s data & techniques (without the architectural changes of 4o/5). Context size & price/performance maybe another story though
All the same they choose to highlight basic prose (and internal knowledge, for that matter) in their marketing material.
They’ve achieved a lot to make recent models more reliable as a building block & more capable of things like math, but for LLMs, saturating prose is to a degree equivalent to saturating usefulness.
Why? It sounds like you're using "I believe it's rapidly getting smarter" as evidence for "so it's getting smarter in ways we don't understand", but I'd expect the causality to go the other way around.
Simply because of what we know about our ability to judge capabilities and systems. It's much harder to judge solutions to hard problems. You can demonstrate that you can add 2+2, and anyone* can be the judge of that ability, but if you try to convince anyone of a mathematical proof you came up with, that would be a much harder thing to do, regardless of your capability to write that prove and how hard it was to write the proof.
The more complicated and/or complex things become, the less likely it is that a human can act as a reliable judge. At some point no human can.
So while it could definitely be the case that AI progress is slowing down (AI labs seem to not think so, but alas), what is absolutely certain is that our ability to appreciate any such progress is diminishing already, because we know that this is generally true.
This thread shows that. People are saying gpt-1 was the best at writing poetry. I wonder how good they are at judging poetry themselves. I saw a blind study where people thought a story written by gpt5 was better than an actual human bestseller. I assume they were actual experts but I would need to check that.
I did not mean "become" in the sense of "evolve" but as in "later on an imagined continuum contained all things, that goes from simple/easy to complex/complicated" (but I can see how that was ambiguous)
MCP should just have been stateless HTTP to begin with. There is no good reason for almost any of the servers I have seen to be stateful at the request/session level —- either the server carries the state globally or it works fine with a session identifier of some sort.
I think some of the advanced features around sampling from the calling LLM could theoretically benefit from a bidirectional stream.
In practice, nobody uses those parts of the protocol (it was overdesigned and hardly any clients support it). The key thing MCP brings right now is a standardized way to discover & invoke tools. This would’ve worked equally well as a plain HTTP-based protocol (certainly for a v1) and it’d have made it 10x easier to implement.
Sampling is to my eyes a very promising aspect of the protocol. Maybe its implementation is lagging behind because it's too far from the previous mental model of tool use. I am also fine if the burden is on the client side if it enables a good DX on server side. In practice, there would be much more servers than clients.
> This would’ve worked equally well as a plain HTTP-based protocol
With plain HTTP you can quite easily "stream" both the request's and the response's body: that's a HTTP/1 feature called "chunking" (the message body is not just one byte array, it's "chunked" so that each chunk can be received in sequence). I really don't get why people think you need WS (or ffs SSE) for "streaming". I've implemented a chat using just good old HTTP/1.1 with chunking. It's actually a perfect use case, so it suits LLMs quite well.
Well, the point is to provide context, it's easier to do if server has state.
For example, you have a MCP client (let's say it's amazon q cli), a you have a MCP server for executing commands over ssh. If connection is maintained between MCP client and server, then MCP server can keep ssh connection alive.
Replace SSH server with anything else that has state - a browser for example (now your AI assistant also can have 500 open tabs)
I don't claim to have a lot of experience on this but my intuition tells me that a connection that ends after the request needs to be reopened for the next request. What is more efficient, keeping the session open or closing it, depends on the usage pattern, how much memory does the session consume, etc. etc.
This is no different from a web app though, there’s no obvious need to reinvent the wheel. We know how to do this very very well: the underlying TCP connection remains active, we multiplex requests, and cookies bridge the gap for multi-request context. Every language has great client & server support for that.
Instead we ended up with a protocol that fights with load balancers and can in most cases not just be chucked into say an existing Express/FastAPI app.
That makes everything harder (& cynically, it creates room for providers like Cloudflare to create black box tooling & advertise it as _the_ way to deploy a remote MCP server)
Their patchy JSON schema support for tool calls & structured generation is also very annoying… things like unions that you’d think are table stakes (and in fact work fine with both OpenAI and Anthropic) get rejected & you have to go reengineer your entire setup to accommodate it.
Funnily enough I recently noticed that the X app on iOS started doing this for me… on ads. If I place my finger on an ad while scrolling down, without fault it opens the ad overlay sheet. I guess that’s one way to increase CPC revenue
To be fair, in my experience ticket checking in Luxembourg was (at least in the city buses) so incredibly lax already that not a lot of revenue will be lost by this. All the same it's a nice policy though.
I think that this is at least partially attributable to the average understanding of English in those countries. When you are watching a movie with subtitles, but have absolutely no idea what is being said in the movie because the spoken language is unknown to you, it can be very distracting, because you are missing a certain amount of context. I believe an above-average amount of Dutch people is at least capable of comprehending basic English speech, which puts the movie's subtitles into context. In countries where English isn't as wide-spread, this is more problematic, which is why dubbing is more effective there. Similarly, even in the Netherlands movies which target family audiences are often shown dubbed by default in movie theatres, because the subtitles are just not as effective even if the children that are watching the movie can read them.
I'd guess it's the opposite: average understanding of english is lower _because_ the content is all translated.
Also, I do not know about germany/france/spain, but italian movies have basically always been dubbed, in the sense that italian actors were dubbed by other italian actors.
And every single time I see an article about Jobs accompanied by a set of comments from a tech-savvy audience I see this same comment about Ritchie (and sometimes McCarthy) resurfacing. You can continue feeling sour about it, but not everyone deserving attention receives as much as they should, and not everyone receiving attention deserves it as much as they get.
Ritchie's and McCarthy's personalities and accomplishments just aren't as interesting to the general public as Jobs', which has little to do with our definition of success but more with the fact that, to be able to obsess over someone, we need them and their work to speak to our imagination, which is a lot harder when their accomplishments are less trivial to understand without any knowledge on their field of study.
General public's tastes and interests can be shifted. Many years ago, Chinese people value hardwork and intelligence. Now they admire same thing as Americans smartness and fame.
While that may be true, it most certainly hasn't happened in the last 3 years. Maybe over time people will stop valuing Jobs' work higher than that of Ritchie and McCarthy, but I think it is likely to stay like this for the forseeable future, just as it has been for quite a while.
I imagine the GPT-4 base model might hold up pretty well on output quality if you’d post-train it with today’s data & techniques (without the architectural changes of 4o/5). Context size & price/performance maybe another story though