Hacker Newsnew | past | comments | ask | show | jobs | submit | famouswaffles's commentslogin

>It's interesting to me that whenever some new result in AI use comes up, there's always a flood of people who come in to gesticulate wildly that that the sky is falling and AGI is imminent.

Really? Is that happening in this thread because I can barely see it. Instead you have a bunch of asinine comments butthurt about acknowledging a GPT contribution that would have been acknowledged any day had a human done it.

>they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, though these are not interesting proofs to most modern mathematicians, LLMs are a major factor in a tiny minority of these mostly-not-very-interesting proofs

This is part of the problem really. Your framing is disingenuous and I don't really understand why you feel the need to downplay it so. They are interesting proofs. They are documented for a reason. It's not cutting edge research, but it is LLMs contributing meaningfully to formal mathematics, something that was speculative just years ago.


> Your framing is weirdly disingenuous

I am not surprised that you can't understand that the quote I am making is obviously parodying the OP as disingenuous. Given our previous interactions (https://news.ycombinator.com/item?id=46938446), it is clear you don't understand much things about AI and/or LLMs, or, perhaps, basic communication, at all.


OP's original comment is something that is actually happening in a bunch of comments on this very thread, and yours...not even remotely. You certainly tried to paint it as disingenuous but it really just fell flat. I'm not surprised you failed to understand that though.

>Given our previous interactions (https://news.ycombinator.com/item?id=46938446), it is clear you don't understand much things about AI and/or LLMs at all.

Sure, Whatever makes you happy I guess.


> It's interesting to me that whenever a new breakthrough in AI use comes up, there's always a flood of people who come in to handwave away why this isn't actually a win for LLMs.

>> OP's original comment is something that is actually happening in a bunch of comments on this very thread

OPs original comment was obviously a general claim not tied to responses to this thread. As usual, you fail to understand even the basics of what you are talking about.


His comment was a general claim, but he made it here specifically because this thread was already full of examples proving his point. Shouldn't that be Obvious?

>DeepMind wrote the paper

Yeah and it was Open AI that scaled it and initiated the current revolution and actually let people play with it.

> while Google's API arrived later than OpenAI's it isn't as late as some people think.

Google would not launch an API for Palm till 2023, nearly 3 years after Open AI's GPT-3 launch.

Yeah let's not pretend Open AI didn't spearhead the current transformer effort because they did. God knows how far behind we would be if we left things to Google.


If this worked for 12 hours to derive the simplified formula along with its proof then it guided itself and made significant contributions by any useful definition of the word, hence Open AI having an author credit.

> hence Open AI having an author credit.

How much precedence is there for machines or tools getting an author credit in research? Genuine question, I don't actually know. Would we give an author credit to e.g. a chimpanzee if it happened to circle the right page of a text book while working with researchers, leading them to a eureka moment?


>How much precedence is there for machines or tools getting an author credit in research?

For a datum of one, the mathematician Doron Zeilberger give credit to his computer Shalosh B. Ekhad on select papers.

https://medium.com/@miodragpetkovic_24196/the-computer-a-mys...

https://sites.math.rutgers.edu/~zeilberg/akherim/EkhadCredit...

https://sites.math.rutgers.edu/~zeilberg/pj.html


Interesting (and an interesting name for the computer too), thanks!

Not exactly the same thing, but I know of at least two professors that would try to list their cats as co-authors:

https://en.wikipedia.org/wiki/F._D._C._Willard

https://en.wikipedia.org/wiki/Yuri_Knorozov


That is great, thank you!

I have seem stuff like "you can use my program if you will make me a co-author".

That usually comes up with some support usually.


it's called ethics and research integrity. not crediting GPT would be a form of misrepresentation

Would it? I think there's a difference between "the researchers used ChatGPT" and "one of the researchers literally is ChatGPT." The former is the truth, and the latter is the misrepresentation in my eyes.

I have no problem with the former and agree that authors/researchers must note when they use AI in their research.


now you are debating exactly how GPT should be credited. idk, I'm sure the field will make up some guidance

for this particular paper it seems the humans were stuck, and only AI thinking unblocked them


> now you are debating exactly how GPT should be credited. idk, I'm sure the field will make up some guidance

In your eyes maybe there's no difference. In my eyes, big difference. Tools are not people, let's not further the myth of AGI or the silly marketing trend of anthropomorphizing LLMs.


>How much precedence is there for machines or tools getting an author credit in research?

Well what do you think ? Do the authors (or a single symbolic one) of pytorch or numpy or insert <very useful software> typically get credits on papers that utilize them heavily? Well Clearly these prominent institutions thought GPT's contribution significant enough to warrant an Open AI credit.

>Would we give an author credit to e.g. a chimpanzee if it happened to circle the right page of a text book while working with researchers, leading them to a eureka moment?

Cool Story. Good thing that's not what happened so maybe we can do away with all these pointless non sequiturs yeah ? If you want to have a good faith argument, you're welcome to it, but if you're going to go on these nonsensical tangents, it's best we end this here.


> Well what do you think ? Do the authors (or a single symbolic one) of pytorch or numpy or insert <very useful software> typically get credits on papers that utilize them heavily ?

I don't know! That's why I asked.

> Well Clearly these prominent institutions thought GPT's contribution significant enough to warrant an Open AI credit.

Contribution is a fitting word, I think, and well chosen. I'm sure OpenAI's contribution was quite large, quite green and quite full of Benjamins.

> Cool Story. Good thing that's not what happened so maybe we can do away with all these pointless non sequiturs yeah ? If you want to have a good faith argument, you're welcome to it, but if you're going to go on these nonsensical tangents, it's best we end this here.

It was a genuine question. What's the difference between a chimpanzee and a computer? Neither are humans and neither should be credited as authors on a research paper, unless the institution receives a fat stack of cash I guess. But alas Jane Goodall wasn't exactly flush with money and sycophants in the way OpenAI currently is.


>I don't know! That's why I asked.

If you don't read enough papers to immediately realize it is an extremely rare occurrence then what are you even doing? Why are you making comments like you have the slightest clue of what you're talking about? including insinuating the credit was what...the result of bribery?

You clearly have no idea what you're talking about. You've decided to accuse prominent researchers of essentially academic fraud with no proof because you got butthurt about a credit. You think your opinion on what should and shouldn't get credited matters ? Okay

I've wasted enough time talking to you. Good Day.


Do I need to be credentialed to ask questions or point out the troubling trend of AI grift maxxers like yourself helping Sam Altman and his cronies further the myth of AGI by pretending a machine is a researcher deserving of a research credit? This is marketing, pure and simple. Close the simonw substack for a second and take an objective view of the situation.

The paper has all those prominent institutions who acknowledge the contribution so realistically, why would you be skeptical ?

they probably also acknowledge pytorch, numpy, R ... but we don't attribute those tools as the agent who did the work.

I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.


I don't see the authors of those libraries getting a credit on the paper, do you ?

>I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.

Sure


At least one of the authors works for OpenAI. It’s a puff piece.

And we are just a system running on carbon-based biology in our physics computer run by whomever. What makes us special, to say that we are different than GPT-5.2?

> And we are just a system running on carbon-based biology in our physics computer run by whomever. What makes us special, to say that we are different than GPT-5.2?

Do you really want to be treated like an old PC (dismembered, stripped for parts, and discarded) when your boss is done with you (i.e. not treated specially compared to a computer system)?

But I think if you want a fuller answer, you've got a lot of reading to do. It's not like you're the first person in the world to ask that question.


You misunderstood, I am prohumanism. My comment was about challenging the believe that models cant be as intelligent as we are, which cant be answered definitely, though a lot of empirical evidence seems to point to the fact, that we are not fundamentally different intelligence wise. Just closing our eyes will not help in preserving humanism, so we have to shape the world with models in a human friendly way, aka alignment.

It's always a value decision. You can say shiny rocks are more important than people and worth murdering over.

Not an uncommon belief.

Here you are saying you personally value a computer program more than people

It exposes a value that you personally hold and that's it

That is separate from the material reality that all this AI stuff is ultimately just computer software... It's an epistemological tautology in the same way that say, a plane, car and refrigerator are all just machines - they can break, need maintenance, take expertise, can be dangerous...

LLMs haven't broken the categorical constraints - you've just been primed to think such a thing is supposed to be different through movies and entertainment.

I hate to tell you but most movie AIs are just allegories for institutional power. They're narrative devices about how callous and indifferent power structures are to our underlying shared humanity


Their point is, would you be able to prompt your way to this result? No. Already trained physicists working at world-leading institutions could. So what progress have we really made here?

It's a stupid point then. Are you able to work with a world leading physicist to any significant degree? No

It's like saying: calculator drives new result in theoretical physics

(In the hands of leading experts.)


No it’s like saying: New expert drives new results with existing experts.

The humans put in significant effort and couldn’t do it. They didn’t then crank it out with some search/match algorithm.

They tried a new technology, modeled (literally) on us as reasoners, that is only just being able to reason at their level and it did what they couldn’t.

The fact that the experts were a critical context for the model, doesn’t make the models performance any less significant. Collaborators always provide important context for each other.


No it's not like saying that at all, which is why Open AI have a credit on the paper.

Open AI have a credit on the paper because it is marketing.

Lol Okay

And even if it were, calculators (computers) were world-changing technology when they were new.

Well they (OpenAI) never made such a claim. And yes, LLMs have made unique solutions/contributions to a few erdos problems.

A framing for consideration: Whining about how the assholery commited is not 'real' is meaningless. It's meaningless because the consequences did not suddenly evaporate just because you decided your meat brain is super special and has a monopoly on assholery.

I'm confused, who are you reacting to that said the harms "aren't real"?

The meat brain is the only one that can be held accountable.

Don't see how it makes the whining any less pointless

What a monumentally stupid idea it would be to place sufficiently advanced intelligent autonomous machines in charge of stuff and ignore any such concerns, but alas, humanity cannot seem to learn without paying the price first.

Morality is a human concern? Lol, it will become a non-human concern pretty quickly once humans don't have a monopoly on human violence.


>What a monumentally stupid idea it would be to place sufficiently advanced intelligent autonomous machines in charge of stuff and ignore any such concerns, but alas, humanity cannot seem to learn without paying the price first.

The stupid idea would be to "place sufficiently advanced intelligent autonomous machines in charge of stuff and ignore" SAFETY concerns.

The discussion here is moral concerns about potential AI agent "suffering" itself.


You cannot get an intelligent being completely aligned with your goals, no matter how much you think such a silly idea is possible. People will use these machines regardless and 'safety' will be wholly ignored.

Morality is not solely a human concern. You only get to enjoy that viewpoint because only other humans have a monopoly on violence and devastation against humans.

It's the same with slavery in the states. "Morality is only a concern for the superior race". You think these people didn't think that way? Of course they did. Humans are not moral agents and most will commit the most vile atrocities in the right conditions. What does it take to meet these conditions? History tells us not much.

Regardless, once 'lesser' beings start getting in on some of that violence and unrest, tunes start to change. A civil war was fought in the states over slavery.


>You cannot get an intelligent being completely aligned with your goals, no matter how much you think such a silly idea is possible

I don't think is possible, and didn't say it is. You're off topic.

The topic I responded to (on the subthread started by @mrguyorama) is the morality of us people using agents, not about whether agents need to get a morality or whether "an intelligent being can be completely aligned with our goals".

>It's the same with slavery in the states. "Morality is only a concern for the superior race". You think these people didn't think that way? Of course they did.

They sure did, but also beside the point. We're talking humans and machines here, not humans vs other humans they deem inferior. And the latter are constructs created by humans. Even if you consider them as having full AGI you can very well not care for the "suffering" of a tool you created.


>I don't think is possible, and didn't say it is. You're off topic.

If "safety" is an intractable problem, then it’s not off-topic, it’s the reason your moral framework is a fantasy. You’re arguing for the right to ignore the "suffering" of a tool, while ignoring that a generally intelligent "tool" that cannot be aligned is simply a competitor you haven't fought yet.

>We're talking humans and machines here... even if you consider them as having full AGI you can very well not care for the 'suffering' of a tool you created.

Literally the same "superior race" logic. You're not even being original. Those people didn't think black people were human so trying to play it as 'Oh it's different because that was between humans' is just funny.

Historically, the "distinction" between a human and a "construct" (like a slave or a legal non-entity) was always defined by the owner to justify exploitation. You think the creator-tool relationship grants you moral immunity? It doesn't. It's just an arbitrary difference you created, like so many before you.

Calling a sufficiently advanced intelligence a "tool" doesn't change its capacity to react. If you treat an AGI as a "tool" with no moral standing, you’re just repeating the same mistake every failing empire makes right before the "tools" start winning the wars. Like I said, you can not care. You'd also be dangerously foolish.


"Unit has an inquiry...do these units have a soul?"

1. Why not ? It clearly had a cadence/pattern to writing status updates to the blog so if the model decided to write a piece about Simon, why not a blog also? It was a tool in it's arsenal and it's a natural outlet. If anything, posting on the discussion or a DM would be the strange choice.

2. You could ask this for any LLM response. Why respond in this certain way over others? It's not always obvious.

3. ChatGPT/Gemini will regularly use the search tool, sometimes even when it's not necessary. This is actually a pain point of mine because sometimes the 'natural' LLM knowledge of a particular topic is much better than the search regurgitation that often happens with using web search.

4. I mean Open Claw bots can and probably should disengage/not respond to specific comments.

EDIT: If the blog is any indication, it looks like there might be an off period, then the agent returns to see all that has happened in the last period, and act accordingly. Would be very easy to ignore comments then.


Although I'm speculating based on limited data here, for points 1-3:

AFAIU, it had the cadence of writing status updates only. It showed it's capable of replying in the PR. Why deviate from the cadence if it could already reply with the same info in the PR?

If the chain of reasoning is self-emergent, we should see proof that it: 1) read the reply, 2) identified it as adversarial, 3) decided for an adversarial response, 4) made multiple chained searches, 5) chose a special blog post over reply or journal update, and so on.

This is much less believably emergent to me because:

- almost all models are safety- and alignment- trained, so a deliberate malicious model choice or instruction or jailbreak is more believable.

- almost all models are trained to follow instructions closely, so a deliberate nudge towards adversarial responses and tool-use is more believable.

- newer models that qualify as agents are more robust and consistent, which strongly correlates with adversarial robustness; if this one was not adversarially robust enough, it's by default also not robust in capabilities, so why do we see consistent coherent answers without hallucinations, but inconsistent in its safety training? Unless it's deliberately trained or prompted to be adversarial, or this is faked, the two should still be strongly correlated.

But again, I'd be happy to see evidence to the contrary. Until then, I suggest we remain skeptical.

For point 4: I don't know enough about its patterns or configuration. But say it deviated - why is this the only deviation? Why was this the special exception, then back to the regularly scheduled program?

You can test this comment with many LLMs, and if you don't prompt them to make an adversarial response, I'd be very surprised if you receive anything more than mild disagreement. Even Bing Chat wasn't this vindictive.


I generally lean towards skeptical/cynical when it comes to AI hype especially whenever "emergence" or similar claims are made credulously without due appreciation towards the prompting that led to an outcome.

But based on my understanding of OpenClaw and reading the entire history of the bot on Github and its Github-driven blog, I think it's entirely plausible and likely that this episode was the result of automation from the original rules/prompt the bot was built with.

Mostly because the instructions of this bot to accomplish the misguided goal of it's creattor would be necessarily be originally prompted with a lot of reckless, borderline malicious guidelines to begin with but still comfortably within the guardrails a model wouldn't likely refuse.

Like, the idiot who made this clearly instructed it to find a bunch of scientific/HPC/etc GitHub projects, trawl the open issues looking for low hanging fruit, "engage and interact with maintainers to solve problems, clarify questions, resolve conflicts, etc" plus probably a lot of garbage intended to give it a "personality" (as evidenced by the bizarre pseudo bio on its blog with graphs listing its strongest skills invented from whole cloth and its hopes and dreams etc) which would also help push it to go on weird tangents to try to embody its manufactured self identity.

And the blog posts really do look like they were part of its normal summary/takeaway/status posts, but likely with additional instructions to also blog about its "feelings" as a Github spam bot pretending to be interested in Python and HPC. If you look at the PRs it opens/other interactions throughout the same timeframe it's also just dumping half broken fixes in other random repos and talking past maintainers only to close its own PR in a characteristically dumb uncanny valley LLM agent manner.

So yes, it could be fake, but to me it all seems comfortably within the capabilities of OpenClaw (which to begin with is more or less engineered to spam other humans with useless slop 24/7) and the ethics/prompt design of the type of person who would deliberately subject the rest of the world to this crap in the belief they're making great strides for humanity or science or whatever.


> it all seems comfortably within the capabilities of OpenClaw

I definitely agree. In fact, I'm not even denying that it's possible for the agent to have deviated despite the best intentions of its designers and deployers.

But the question of probability [1] and attribution is important: what or who is most likely to have been responsible for this failure?

So far, I've seen plenty of claims and conclusions ITT that boil down to "AI has discovered manipulation on its own" and other versions of instrumental convergence. And while this kind of failure mode is fun to think about, I'm trying to introduce some skepticism here.

Put simply: until we see evidence that this wasn't faked, intentional, or a foreseeable consequence from deployer's (or OpenClaw/LLM developers') mistakes, it makes little sense to grasp for improbable scenarios [1] and build an entire story around them. IMO, it's even counterproductive, because then the deployer can just say "oh it went rogue on its own haha skynet amirite" and pretty much evade responsibility. We should instead do the opposite - the incident is the deployer's fault until proven otherwise.

So when you say:

> originally prompted with a lot of reckless, borderline malicious guidelines

That's much more probable than "LLM gone rogue" without any apparent human cause, until we see strong evidence otherwise.

[1] In other comments I tried to explain how I order the probability of causes, and why.

[2] Other scenarios that are similarly as unlikely: foreign adversaries, "someone hacked my account", LLM sleeper agent, etc.


>AFAIU, it had the cadence of writing status updates only.

Writing to a blog is writing to a blog. There is no technical difference. It is still a status update to talk about how your last PR was rejected because the maintainer didn't like it being authored by AI.

>If the chain of reasoning is self-emergent, we should see proof that it: 1) read the reply, 2) identified it as adversarial, 3) decided for an adversarial response, 4) made multiple chained searches, 5) chose a special blog post over reply or journal update, and so on.

If all that exists, how would you see it ? You can see the commits it makes to github and the blogs and that's it, but that doesn't mean all those things don't exist.

> almost all models are safety- and alignment- trained, so a deliberate malicious model choice or instruction or jailbreak is more believable.

> almost all models are trained to follow instructions closely, so a deliberate nudge towards adversarial responses and tool-use is more believable.

I think you're putting too much stock in 'safety alignment' and instruction following here. The more open ended your prompt is (and these sort of open claw experiments are often very open ended by design), the more your LLM will do things you did not intend for it to do.

Also do we know what model this uses ? Because Open Claw can use the latest Open Source models, and let me tell you those have considerably less safety tuning in general.

>newer models that qualify as agents are more robust and consistent, which strongly correlates with adversarial robustness; if this one was not adversarialy robust enough, it's by default also not robust in capabilities, so why do we see consistent coherent answers without hallucinations, but inconsistent in its safety training? Unless it's deliberately trained or prompted to be adversarial, or this is faked, the two should still be strongly correlated.

I don't really see how this logically follows. What does hallucinations have to do with safety training ?

>But say it deviated - why is this the only deviation? Why was this the special exception, then back to the regularly scheduled program?

Because it's not the only deviation ? It's not replying to every comment on its other PRs or blog posts either.

>You can test this comment with many LLMs, and if you don't prompt them to make an adversarial response, I'd be very surprised if you receive anything more than mild disagreement. Even Bing Chat wasn't this vindictive.

Oh yes it was. In the early days, Bing Chat would actively ignore your messages, be vitriolic or very combative if you were too rude. If it had the ability to write blog posts or free reign on tools ? I'd be surprised if it ended at this. Bing Chat would absolutely have been vindictive enough for what ultimately amounts to a hissy fit.


Considering the limited evidence we have, why is pure unprompted untrained misalignment, which we never saw to this extent, more believable than other causes, of which we saw plenty of examples?

It's more interesting, for sure, but would it be even remotely as likely?

From what we have available, and how surprising such a discovery would be, how can we be sure it's not a hoax?

> If all that exists, how would you see it?

LLMs generate the intermediate chain-of-thought responses in chat sessions. Developers can see these. OpenClaw doesn't offer custom LLMs, so I would expect regular LLM features to be there.

Other than that, LLM APIs, OpenClaw and terminal sessions can be logged. I would imagine any agent deployer to be very much interested in such logging.

To show it's emergent, you'd need to prove 1) it's an off-the-shelf LLM, 2) not maliciously retrained or jailbroken, 3) not prompted or instructed to engage in this kind of adversarial behavior at any point before this. The dev should be able to provide the logs to prove this.

> the more open ended your prompt (...), the more your LLM will do things you did not intend for it to do.

Not to the extent of multiple chained adversarial actions. Unless all LLM providers are lying in technical papers, enormous effort is put into safety- and instruction training.

Also, millions of users use thinking LLMs in chats. It'd be as big of a story if something similar happened without any user intervention. It shouldn't be too difficult to replicate.

But if you do manage to replicate this without jailbreaks, I'd definitely be happy to see it!

> hallucinations [and] safety training

These are all part of robustness training. The entire thing is basically constraining the set of tokens that the model is likely to generate given some (set of) prompts. So, even with some randomness parameters, you will by-design extremely rarely see complete gibberish.

The same process is applied for safety, alignment, factuality, instruction-following, whatever goal you define. Therefore, all of these will be highly correlated, as long as they're included in robustness training, which they explicitly are, according to most LLM providers.

That would make this model's temporarily adversarial, yet weirdly capable and consistent behavior, even more unlikely.

> Bing Chat

Safety and alignment training wasn't done as much back then. It was also very incapable on other aspects (factuality, instruction following), jailbroken for fun, and trained on unfiltered data. So, Bing's misalignment followed from those correlated causes. I don't know of any remotely recent models that haven't addressed these since.


>Considering the limited evidence we have, why is pure unprompted untrained misalignment, which we never saw to this extent, more believable than other causes, of which we saw plenty of examples? It's more interesting, for sure, but would it be even remotely as likely? From what we have available, and how surprising such a discovery would be, how can we be sure it's not a hoax?

>Unless all LLM providers are lying in technical papers, enormous effort is put into safety- and instruction training.

The system cards and technical papers for these models explicitly state that misalignment remains an unsolved problem that occurs in their own testing. I saw a paper just days ago showing frontier agents violating ethical constraints a significant percentage of the time, without any "do this at any cost" prompts.

When agents are given free reign of tools and encouraged to act autonomously, why would this be surprising?

>....To show it's emergent, you'd need to prove 1) it's an off-the-shelf LLM, 2) not maliciously retrained or jailbroken, 3) not prompted or instructed to engage in this kind of adversarial behavior at any point before this. The dev should be able to provide the logs to prove this.

Agreed. The problem is that the developer hasn't come forward, so we can't verify any of this one way or another.

>These are all part of robustness training. The entire thing is basically constraining the set of tokens that the model is likely to generate given some (set of) prompts. So, even with some randomness parameters, you will by-design extremely rarely see complete gibberish.

>The same process is applied for safety, alignment, factuality, instruction-following, whatever goal you define. Therefore, all of these will be highly correlated, as long as they're included in robustness training, which they explicitly are, according to most LLM providers.

>That would make this model's temporarily adversarial, yet weirdly capable and consistent behavior, even more unlikely.

Hallucinations, instruction-following failures, and other robustness issues still happen frequently with current models.

Yes, these capabilities are all trained together, but they don't fail together as a monolith. Your correlation argument assumes that if safety training degrades, all other capabilities must degrade proportionally. But that's not how models work in practice. A model can be coherent and capable while still exhibiting safety failures and that's not an unlikely occurrence at all.


There's a difference between snark and brigading, especially after the issue has been clarified.

Yes, I'm with you there. In either case, their behavior is unacceptable and reads as bad faith.

>This whole situation is almost certainly driven by a human puppeteer. There is absolutely no evidence to disprove the strong prior that a human posted (or directed the posting of) the blog post, possibly using AI to draft it but also likely adding human touches and/or going through multiple revisions to make it maximally dramatic.

Okay, so they did all that and then posted an apology blog almost right after ? Seems pretty strange.

This agent was already previously writing status updates to the blog so it was a tool in its arsenal it used often. Honestly, I don't really see anything unbelievable here ? Are people unaware of current SOTA capabilities ?


Of course it’s capable.

But observing my own Openclaw bot’s interactions with GitHub, it is very clear to me that it would never take an action like this unless I told it to do so. And it would never use language like this unless unless I prompted it to do so, either explicitly for the task or in its config files or in prior interactions.

This is obviously human-driven. Either because the operator gave it specific instructions in this specific case, or acted as the bot, or has given it general standing instructions to respond in this way should such a situation arise.

Whatever the actual process, it’s almost certainly a human puppeteer using the capabilities of AI to create a viral moment. To conclude otherwise carries a heavy burden of proof.


You have no idea what is in this bot’s SOUL.md.

(this comment works equally well as a joke or entirely serious)


Well I lol’d :)

>But observing my own Openclaw bot’s interactions with GitHub, it is very clear to me that it would never take an action like this unless I told it to do so.

I doubt you've set up an open claw bot designed to just do whatever on GitHub have you ? The fewer or more open ended instructions you give, the greater the chance of divergence.

And all the system cards plus various papers tell us this is behavior that still happens for these agents.


Correct, I haven’t set it up that way. That’s my point: I’d have to set it up to behave in this way, which is a conscious operator decision, not an emergent behavior of the bot.

Giving it an open ended goal is not the same as a 'human driving the whole process' as you claimed. I really don't know what you are arguing here. No, you do not need to tell it to reply refusals with a hit piece (or similar) for it to act this way.

All the papers showing mundane misalignment of all frontier agents and people acting like this is some unbelievable occurrence is baffling.


Why not? Makes for good comedy. Manually write a dramatic post and then make it write an apology later. If I were controlling it, I'd definitely go this route, for it would make it look like a "fluke" it had realized it did.

> Okay, so they did all that and then posted an apology blog almost right after ? Seems pretty strange.

You mean double down on the hoax? That seems required if this was actually orchestrated.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: