More

malisper · 2025-11-28T16:58:24 1764349104

Three consecutive months of decline starts to look more like a trend. Unless you think there's a transient issue causing the decline, something fundamental has changed

chrismorgan · 2025-11-28T19:15:05 1764357305

Again: compare early 2024. And that’s not the only thing; the second chart shows a possible flattening, but by no means certain yet, especially not when taken with the clear March–April jump; and the first chart shows no dwindling in 1–4, and clear recovery in 250+. The lie is easily put to the claim the article makes:

> Data from the Census Bureau and Ramp shows that AI adoption rates are starting to flatten out across all firm sizes, see charts below.

It’s flat-out nonsense, and anyone with any experience in this kind of statistics can see it.

malisper · 2025-11-28T16:51:58 1764348718

From the chart, the percentage of companies using AI has been going down over the past couple of months

That's a massive deal because the AI companies today are valued on the assumption that they'll 10x their revenue over the next couple of years. If their revenue growth starts to slow down, their valuations will change to reflect that

adventured · 2025-11-28T19:28:46 1764358126

This bubble phase will play out just as the previous have in tech: consolidation, most of the value creation will go to a small group of companies. Most will die, some will thrive.

Companies like Anthropic will not survive as an independent. They won't come close to having enough revenue & profit to sustain their operating costs (they're Lyft to Google or OpenAI's Uber, Anthropic will never reach the scale needed to roll over to significant profit generation). Its fair value is 1/10th or less what it's being valued at currently (yes because I say so). Anthropic's valuation will implode to reconcile that, as the market for AI does. Some larger company will scoop them up during the pain phase, once they get desperate enough to sell. When the implosion of the speculative hype is done, the real value creation will begin thereafter. Over the following two or three decades a radical amount of value will be generated by AI collectively, far beyond anything seen during this hype phase. A lot of lesser AI companies will follow the same path as Anthropic.

malisper · 2025-11-27T21:08:47 1764277727

To be fair to OpenAI, their privacy policy[0] does provide some detail. They don't mention Mixpanel explicitly, but OpenAI does mention they share your information with third-party web analytics services:

> To assist us in meeting business operations needs and to perform certain services and functions, we may disclose Personal Data to vendors and service providers, including providers of ... web analytics services ...

OpenAI likely provides this disclosure to comply with US state privacy laws, but it's inaccurate to say they didn't disclose that they won't share your information

[0] https://openai.com/policies/privacy-policy/

malisper · 2025-11-25T19:40:14 1764099614

Not exactly. Step E in the blog post:

> Gemini exfiltrates the data via the browser subagent: Gemini invokes a browser subagent per the prompt injection, instructing the subagent to open the dangerous URL that contains the user's credentials.

fulfills the requirements for being able to change external state

ArcHound · 2025-11-25T19:43:38 1764099818

I disagree. No state "owned" by LLM changed, it only sent a request to the internet like any other.

EDIT: In other words, the LLM didn't change any state it has access to.

To stretch this further - clicking on search results changes the internal state of Google. Would you consider this ability of LLM to be state-changing? Where would you draw the line?

wingmanjd · 2025-11-25T19:48:26 1764100106

[EDIT]

I should have included the full C option:

Change state or communicate externally. The ability to call `cat` and then read results would "activate" the C option in my opinion.

malisper · 2025-05-01T01:55:29 1746064529

> Also, ARC AGI reported they've been unable to independently replicate OpenAI's claimed breakthrough score from December

Can you elaborate on this? Where did ARC AGI report that? From ARC AGI[0]:

> ARC Prize Foundation was invited by OpenAI to join their “12 Days Of OpenAI.” Here, we shared the results of their first o3 model, o3-preview, on ARC-AGI. It set a new high-water mark for test-time compute, applying near-max resources to the ARC-AGI benchmark.

> We announced that o3-preview (low compute) scored 76% on ARC-AGI-1 Semi Private Eval set and was eligible for our public leaderboard. When we lifted the compute limits, o3-preview (high compute) scored 88%. This was a clear demonstration of what the model could do with unrestricted test-time resources. Both scores were verified to be state of the art.

That makes it sound like ARC AGI were the ones running the original test with o3

What they say they haven't been able to reproduce is o3-preview's performance with the production versions of o3. They attribute this to the production versions being given less compute than the versions they ran in the test

[0] https://arcprize.org/blog/analyzing-o3-with-arc-agi

malisper · 2025-03-25T18:17:48 1742926668

I've been using a math puzzle as a way to benchmark the different models. The math puzzle took me ~3 days to solve with a computer. A math major I know took about a day to solve it by hand.

Gemini 2.5 is the first model I tested that was able to solve it and it one-shotted it. I think it's not an exaggeration to say LLMs are now better than 95+% of the population at mathematical reasoning.

For those curious the riddle is: There's three people in a circle. Each person has a positive integer floating above their heads, such that each person can see the other two numbers but not his own. The sum of two of the numbers is equal to the third. The first person is asked for his number, and he says that he doesn't know. The second person is asked for his number, and he says that he doesn't know. The third person is asked for his number, and he says that he doesn't know. Then, the first person is asked for his number again, and he says: 65. What is the product of the three numbers?

hmottestad · 2025-03-25T18:50:23 1742928623

This looks like it’s been posted on Reddit 10 years ago:

https://www.reddit.com/r/math/comments/32m611/logic_question...

So it’s likely that it’s part of the training data by now.

canucker2016 · 2025-03-25T19:10:03 1742929803

You'd think so, but both Google's AI Overview and Bing's CoPilot output wrong answers.

Google spits out: "The product of the three numbers is 10,225 (65 * 20 * 8). The three numbers are 65, 20, and 8."

Whoa. Math is not AI's strong suit...

Bing spits out: "The solution to the three people in a circle puzzle is that all three people are wearing red hats."

Hats???

Same text was used for both prompts (all the text after 'For those curious the riddle is:' in the GP comment), so Bing just goes off the rails.

moritzwarhier · 2025-03-25T19:13:08 1742929988

That's a non-sequitur, they would be stupid to run ab expensive _L_LM for every search query. This post is not about Google Search being replaced by Gemini 2.5 and/or a chatbot.

michaelt · 2025-03-25T21:00:12 1742936412

Yes, putting an expensive LLM response atop each search query would be quite stupid.

You know what would be even stupider? Putting a cheap, wrong LLM response atop each search query.

canucker2016 · 2025-03-25T19:22:17 1742930537

Google placed its "AI overview" answer at the top of the page.

The second result is this reddit.com answer, https://www.reddit.com/r/math/comments/32m611/logic_question..., where at least the numbers make sense. I haven't examined the logic portion of the answer.

Bing doesn't list any reddit posts (that Google-exclusive deal) so I'll assume no stackexchange-related sites have an appropriate answer (or bing is only looking for hat-related answers for some reason).

moritzwarhier · 2025-03-25T19:39:24 1742931564

I might have been phrasing poorly. With _L_ (or L as intended), I meant their state-of-the-art model, which I presume Gemini 2.5 is (didn't come around to TFA yet). Not sure if this question is just about model size.

I'm eagerly awaiting an article about RAG caching strategies though!

vicek22 · 2025-03-25T19:53:22 1742932402

The riddle has a different variants with hats https://erdos.sdslabs.co/problems/5

Etherlord87 · 2025-03-26T09:29:13 1742981353

There's 3 toddlers on the floor. You ask them a hard mathematical question. One of the toddlers plays around pieces of paper on the ground and happens to raise one that has the right answer written on it.

- This kid is a genius! - you yell

- But wait, the kid has just picked an answer from the ground, it didn't actually come up...

- But the other toddlers could do it also but didn't!

malisper · 2025-03-25T19:10:35 1742929835

Other models aren't able to solve it so there's something else happening besides it being in the training data. You can also vary the problem and give it a number like 85 instead of 65 and Gemini is still able to properly reason through the problem

lolinder · 2025-03-25T19:54:40 1742932480

I'm sure you're right that it's more than just it being in the training data, but that it's in the training data means that you can't draw any conclusions about general mathematical ability using just this as a benchmark, even if you substitute numbers.

There are lots of possible mechanisms by which this particular problem would become more prominent in the weights in a given round of training even if the model itself hasn't actually gotten any better at general reasoning. Here are a few:

* Random chance (these are still statistical machines after all)

* The problem resurfaced recently and shows up more often than it used to.

* The particular set of RLHF data chosen for this model draws out the weights associated with this problem in a way that wasn't true previously.

mrtesthah · 2025-03-26T00:06:04 1742947564

Google Gemini 2.5 is able to search the web, so if you're able to find the answer on reddit, maybe it can too.

mattkevan · 2025-03-25T20:32:26 1742934746

I think there’s a big push to train LLMs on maths problems - I used to get spammed on Reddit with ads for data tagging and annotation jobs.

Recently these have stopped and they’re now the ads are about becoming a maths tutor to AI.

Doesn’t seem like a role with long-term prospects.

7e · 2025-03-25T19:57:38 1742932658

Sure, but you can't cite this puzzle as proof that this model is "better than 95+% of the population at mathematical reasoning" when the method of solving (the "answer") it is online, and the model has surely seen it.

stabbles · 2025-03-25T22:28:28 1742941708

It gets it wrong when you give it 728. It claims (728, 182, 546). I won't share the answer so it won't appear in the next training set.

WithinReason · 2025-03-26T06:26:59 1742970419

with 728 the puzzle doesn't work since it's divisible by 8

eru · 2025-03-26T06:38:51 1742971131

But then the AI should tell you that, too, if it really understand the problem?

stabbles · 2025-03-26T11:28:02 1742988482

Fair, the question is what possible solutions exists.

toonalfrink · 2025-04-07T12:58:05 1744030685

This whole answer hinges on knowing that 0 is not a positive integer, that's why I couldn't figure it out...

f1shy · 2025-03-26T05:21:24 1742966484

Thaks. I wanted to do exactly that: find the answer online. It is amazing that people (even in HN) think that LLM can reason. It just regurgitates the input.

jug · 2025-03-28T09:21:54 1743153714

Have you given a reasoning model a novel problem and watched its chain of thought process?

Etherlord87 · 2025-03-26T09:32:31 1742981551

I think it can reason. At least if it can work in a loop ("thinking"). It's just that this reasoning is far inferior to human reasoning, despite what some people hastily claim.

motoxpro · 2025-03-26T05:35:43 1742967343

I would say that 99.99% of humans do the same. Most people never come up with anything novel.

f1shy · 2025-03-26T07:39:13 1742974753

I would say maybe about 80% certainly not 99.99%. But I've seen that in college, some would only be able to solve the problems which were pretty much the same as others already seen. Notably some guys could easily come up with solutions to complex problems they did not see before. I have the opinion that no human at age 20 can have the amount of input a LLM today. And still humans of age 20 do come with very new ideas pretty often (new in the sense that (s)he has not seen that or anything like it before). Of course there are more and less creative/intelligent people...

WA · 2025-03-26T07:39:11 1742974751

Reasoning != coming up with something novel.

drexlspivey · 2025-03-25T22:15:56 1742940956

And if it wasn’t, it is now

thaumasiotes · 2025-03-25T22:10:09 1742940609

[flagged]

thaumasiotes · 2025-03-26T05:46:12 1742967972

Is there a reason for the downvotes here? We can see that having the answer in the training data doesn't help. If it's in there, what's that supposed to show?

klausa · 2025-03-26T07:32:20 1742974340

It's entirely unclear what are you trying to get across, at least to me.

Generally speaking, posting output from a LLM, without explaining exactly what do you think it illustrates and why is frowned upon here. I don't think your comment does a great job of the latter.

thaumasiotes · 2025-03-26T08:18:55 1742977135

>> So it’s likely that it’s part of the training data by now.

> I don't think this means what you think it means.

> I did some interacting with the Tencent model that showed up here a couple days ago [...]

> This is a question that obviously was in the training data. How do you get the answer back out of the training data?

What do I think the conversation illustrates? Probably that having the answer in the training data doesn't get it into the output.

How does the conversation illustrate that? It isn't subtle. You can see it without reading any of the Chinese. If you want to read the Chinese, Google Translate is more than good enough for this purpose; that's what I used.

Workaccount2 · 2025-03-26T14:11:35 1742998295

Your intentions are good, but your execution is poor.

I cannot figure out what the comment is trying to get across either. It's easy for you because you already know what you are trying to say. You know what the pasted output shows. The poor execution is in not spending enough time thinking about how someone coming in totally blind would interpret the comment.

klausa · 2025-03-26T08:40:49 1742978449

> How does the conversation illustrate that? It isn't subtle. You can see it without reading any of the Chinese.

I can't, and I imagine most of the people who downvoted you couldn't either.

I think asking people to go to Google Translate to parse a random comment that seems to be 90% LLM output by volume is a bit much.

RugnirViking · 2025-03-26T15:17:48 1743002268

I have translated the Chinese. I still have no idea what point you're trying to make. You ask it questions about some kind of band, and it answers. Are you saying the answers are wrong?

eru · 2025-03-26T06:39:41 1742971181

No clue. Perhaps people object to the untranslated Chinese?

meepmorp · 2025-03-26T08:09:25 1742976565

> Is there a reason for the downvotes here?

I didn't downvote you, but like (probably) most people here, I can't read Chinese; I can't derive whatever point you're trying to make just from with text you provided.

_cs2017_ · 2025-03-25T21:10:17 1742937017

This is solvable in roughly half an hour on pen and paper by a random person I picked with no special math skills (beyond a university). This is far from a difficult problem. The "95%+" in math reasoning is a meaningless standard, it's like saying a model is better than 99.9% of world population in Albanian language, since less than 0.1% bother to learn Albanian.

Even ignoring the fact that this or similar problem may have appeared in the training data, it's something a careful brute-force math logic should solve. It's neither difficult, nor interesting, nor useful. Yes, it may suggest a slight improvement on the basic logic, but no more so than a million other benchmarks people quote.

This goes to show that evaluating models is not a trivial problem. In fact, it's a hard problem (in particular, it's a far far harder than this math puzzle).

windowshopping · 2025-03-26T02:37:05 1742956625

The "random person" you picked is likely very, very intelligent and not at all a good random sample. I'm not saying this is difficult to the extent that it merits academic focus, but it is NOT a simple problem and I suspect less than 1% of the population could solve this in half an hour "with no special math skills." You have to be either exceedingly clever or trained in a certain type of reasoning or both.

sundarurfriend · 2025-03-26T06:51:51 1742971911

I agree with your general point that this "random person" is probably not representative of anything close to an average person off the street, but I think the phrasing "very very intelligent" and "exceedingly clever" is kinda misleading.

In my experience, the difference between someone who solves this type of logic puzzle and someone who doesn't, has more to do with persistence and ability to maintain focus, rather than "intelligence" in terms of problem-solving ability per se. I've worked with college students helping them learn to solve these kinds of problems (eg. as part of pre-interview test prep), and in most cases, those who solve it and those who don't have the same rate of progress towards the solution as long as they're actively working at it. The difference comes in how quickly they get frustrated (at themselves mostly), decide they're not capable of solving it, and give up on working on it further.

I mention this because this frustration itself comes from a belief that the ability to solve these belongs some "exceedingly clever" people only, and not someone like them. So, this kind of thinking ends up being a vicious cycle that keeps them from working on their actual issues.

dskloet · 2025-03-26T06:54:07 1742972047

I solved it in less than 15 minutes while walking my dog, no pen or paper. But I wouldn't claim to be a random person without math skills. And my very first guess was correct.

It was a fun puzzle though and I'm surprised I didn't know it already. Thanks for sharing.

wrasee · 2025-03-26T15:04:34 1743001474

My dog solved it in less than 14 minutes, no pen or paper, and no fingers.

Seriously though, nice work.

wrasee · 2025-03-26T14:12:12 1742998332

So in the three hours between you reading the puzzle in the parent comment, you stopped what you were doing, managed to get some other "random" person to stop what they were doing and spend half an hour of their time on a maths puzzle that at that point prior experience suggested could take a day? All within three hours?

That's not to say that you didn't, or you're recalling from a previous time that happens to be this exact puzzle (despite there being scant prior references to this puzzle, and precisely the reason for using it). But you can see how some might see that as not entirely credible.

Best guess: this random person is someone that really likes puzzles, is presumably good at them and is very, very far from being representative to the extent you would require to be in support of your argument.

Read: just a heavy flex about puzzle solving.

re-thc · 2025-03-26T09:00:14 1742979614

> This is solvable in roughly half an hour on pen and paper by a random person I picked with no special math skills (beyond a university).

I randomly answered this post and can't solve it in half an hour. Is the point leet code but for AI? I rather it solve real problems than "elite problems".

Side note: couldn't even find pen and paper around in half an hour.

sebzim4500 · 2025-03-25T18:36:32 1742927792

This is a great riddle. Unfortunately, I was easily able to find the exact question with a solution (albeit with a different number) online, thus it will have been in the training set.

Workaccount2 · 2025-03-25T18:54:01 1742928841

What makes this interesting is that while the question is online (on reddit, from 10 years ago) other models don't get the answer right. Gemini also shows it's work and it seems to do a few orders of magnitude more calculating then the elegant answer given on reddit.

Granted this is all way over my head, but the solution gemini comes to matches the one given on reddit (and now here in future training runs)

65×26×39=65910

sebzim4500 · 2025-03-25T19:04:04 1742929444

>Gemini also shows it's work and it seems to do a few orders of magnitude more calculating then the elegant answer given on reddit.

I don't think Gemini does an unnecessary amount of computation, it's just more verbose. This is typical of reasoning models, almost every step is necessary but many would not be written down by a human.

varispeed · 2025-03-25T18:44:55 1742928295

Seems like we might need a section of internet that is off limits to robots.

Centigonal · 2025-03-25T18:47:23 1742928443

everyone with limited bandwidth has been trying to limit site access to robots. the latest generation of AI web scrapers are brutal and do not respect robots.txt

varispeed · 2025-03-25T19:31:54 1742931114

There are websites where you can only register to in person and have two existing members vouch for you. Probably still can be gamed, but sounds like a great barrier to entry for robots (for now).

tmpz22 · 2025-03-25T20:40:18 1742935218

What prevents someone from getting access and then running an authenticated headless browser to scoop the data?

varispeed · 2025-03-25T21:56:04 1742939764

Admins will see unusual traffic from that account and then take action. Of course it will not be perfect as there could be a way to mimic human traffic and slowly scrape the data anyway, that's why there is element of trust (two existing members to vouch).

tmpz22 · 2025-03-26T00:58:11 1742950691

Yeah don’t get me wrong I believe raising the burden of extraction is an effective strategy I just think it’s been solved at scale ie voting rings and astro turfing operations on Reddit - and at the nation state level I’d just bribe or extort the mods and admins directly (or the IT person to dump the database).

varispeed · 2025-03-26T12:00:53 1742990453

That's entirely possible, especially if the site is small and not run by people with access to resources like physical security, legal etc.

baq · 2025-03-25T18:58:43 1742929123

It’s here and it’s called discord.

Zandikar · 2025-03-25T19:06:24 1742929584

I have bad news for you if you think non paywalled / non phone# required discord communities are immune to AI scraping, especially as it costs less than hammering traditional websites as the push-on-change event is done for you in real time chat contexts.

Especially as the company archives all those chats (not sure how long) and is small enough that a billion dollar "data sharing" agreement would be a very inticing offer.

If there isn't a significant barrier to access, it's being scraped. And if that barrier is money, it's being scraped but less often.

Davidzheng · 2025-03-25T19:08:06 1742929686

Honestly someone should scrape the algebraic topology Discord to AI it'll be a nice training set

kylebenzle · 2025-03-25T22:48:37 1742942917

Or we could just accept that LLMs can only output what we have put in and calling them, "AI" was a misnomer from day one.

eru · 2025-03-26T06:41:09 1742971269

Why would you accept a lie?

kylebenzle · 2025-03-27T14:18:31 1743085111

I'm not sure what you mean but I'm trying to say our current LLMs are not artificially intelligent and calling them "AI" has confused a lot of the lay public.

beefnugs · 2025-03-26T15:49:58 1743004198

Why is this a great riddle? It sounds like incomplete nonsense to me:

It doesnt say anything about the skill levels of the participants, whether their answers are just guessing, or why they arent just guessing the sum of the other two people each time asked to provide more information?

It doesnt say the guy saying 65 is even correct

How could three statements of "no new information" give information to the first guy that didn't know the first time he was asked?

DangitBobby · 2025-03-26T21:29:07 1743024547

2 and 3 saying they don't know eliminates some uncertainties 1 had about their own number (any combination where the other two would see numbers that could tell them their own). After those possibilities were eliminated, the 1st person has narrowed it down enough to actually know based on the numbers shown above the other 2. The puzzle could instead have been done in order 2, 3, 1 and 1 would not have needed to go twice.

I guess really the only missing information is that they have the exact same information you do, plus the numbers above their friends heads.

Nition · 2025-03-26T23:32:25 1743031945

> The puzzle could instead have been done in order 2, 3, 1 and 1 would not have needed to go twice.

If this is true, then back in the original 1->2->3->1 form, shouldn't person #3 have been able to answer it?

yifanl · 2025-03-25T19:13:52 1742930032

You'd have better results if you had prompted it with the actual answer and asked how the first person came to the conclusion. Giving a number in the training set is very easy.

i.e. You observe three people in a magical room. The first person is standing underneath a 65, the second person is standing underneath a 26 and the third person is standing underneath a 39. They can see the others numbers but not the one they are directly under. You tell them one of the three numbers is the sum of the other two and all numbers are positive integers. You ask the first person for their number, they respond that they don't know. You ask the second person for their number, they respond that they don't know. You ask the third person, they respond that they don't know. You ask the first person again and they respond with the correct value, how did they know?

And of course, if it responds with a verbatim answer in the line of https://www.reddit.com/r/math/comments/32m611/logic_question..., we can be pretty confident what's happening under the hood.

semiinfinitely · 2025-03-25T18:47:07 1742928427

I love how the entire comment section is getting one-shotted by your math riddle instead of the original post topic.

refulgentis · 2025-03-25T18:41:44 1742928104

In general I find commentary here too negative on AI, but I'm a bit squeamish about maximalist claims re: AI mathematical reasoning vs. human population based off this, even setting aside lottery-ticket-hypothesis-like concerns.

It's a common logic puzzle, Google can't turn up an exact match to the wording you have, but ex. here: https://www.futilitycloset.com/2018/03/03/three-hat-problem/

utopcell · 2025-03-26T01:54:17 1742954057

Same here: My problem of choice is the 100 prisoners problem [1]. I used to ask simple reasoning questions in the style of "what is the day three days before the day after tomorrow", but nowadays when I ask such questions, I can almost feel the the NN giggling at the naivety of its human operator.

[1] https://en.wikipedia.org/wiki/100_prisoners_problem

r0fl · 2025-03-25T21:50:05 1742939405

Wow

Tried this in deepseek and grok and it kept thunking in loops for a while and I just turned it off

I haven’t seen a question loop this long ever.

Very impressed

z2 · 2025-03-26T04:46:27 1742964387

Deepseek R1 got the right answer after a whopping ~10 minutes of thinking. I'm impressed and feel kind of dirty, I suspect my electricity use from this could have been put to better use baking a frozen pizza.

deepboy2 · 2025-03-26T04:19:18 1742962758

Just tried it on Deepseek (not R1, maybe V3-0324) and got the correct answer after 7-8 pages of reasoning. Incredible!

SwayStar123 · 2025-03-25T19:22:40 1742930560

You can also put the AI in the first person's shoes. Prompt: You are standing in a circle, there are 2 other people in the circle with you, everyone in the circle, has a positive integer above their head, no one knows what the number above their own head is but can see the numbers above the heads of the other people. You see that the person infront of you on the left has 26 above their head. The person on the right has 39 above their head. You are told that the sum of two of the numbers is the third number. You are asked what the number above your head is, the option is the sum, 65, or 13, as 26 + 13 = 39. You don't know which one it is, and you say so. The second person is asked the number above their head. They also say they dont know, the third person also says they dont know. What is your number?

Gemini 2.5 and claude 3.7 thinking get it right, o3 mini and 4o get it wrong

drewbeck · 2025-03-26T17:32:42 1743010362

I just asked it this twice and it gave me 65×65×130=549250. Both times. The first time I made it about ducks instead of people and mentioned that there was a thunderstorm. The second time I c/p your exact text and it gave me the same answer.

Again we find that the failure state of LLMs is a problem – yeah, when you know the answer already and it gets it right, that's impressive! When it fails, it still acts the same exact way and someone who doesn't already know the answer is now a lil stupider.

ototot · 2025-03-25T19:10:53 1742929853

I also tried one-shot.

https://g.co/gemini/share/badd00a824d2

eru · 2025-03-26T06:36:18 1742970978

I use an algorithmic question that I'd been working on for years and that I'm finally writing up the answer to.

It's basically: given a sequence of heap operations (insert element, delete minimum element), can you predict the left-over elements (that are in the heap at the end) in linear time in the comparison model?

(The answer is surprisingly: Yes.)

integralof5y · 2025-03-26T16:43:42 1743007422

A prolog program, swipl (it takes less than a second to solve your puzzle)

N is number of turns of don't know answers. the bad predicate means that the person can know its number at turn N.

  bad(_,_,_,-1) :- !,false.
  bad(_,A,A,0) :- !.
  bad(A,_,A,0) :- !.
  bad(A,A,_,0) :- !.
  bad(B,C,A,N) :- D is abs(B-A),D<C,N1 is N-1, bad(B,D,A,N1),!.
  bad(C,A,B,N) :- D is abs(B-A),D<C,N1 is N-1, bad(D,A,B,N1),!.
  bad(A,B,C,N) :- D is abs(B-A),D<C,N1 is N-1, bad(A,B,D,N1),!.
  
  solve(X,Y,Z) :- Y1 is X-1, between(1,Y1,Y),
                  between(0,2,N), Z is X-Y,bad(X,Y,Z,N).

  ?- solve(65,X,Y).
  X = 26,
  Y = 39 ;
  X = 39,  
  Y = 26 .

adpirz · 2025-03-25T19:42:45 1742931765

Interactive playground for the puzzle: https://claude.site/artifacts/832e77d7-5f46-477c-a411-bdad10...

(All state is stored in localStorage so you can come back to it :) ).

TrackerFF · 2025-03-25T18:43:47 1742928227

The riddle certainly nerd-sniped GPT 4.5

After a couple of minutes it decided on the answer being 65000. (S = {65, 40, 25)}

dkjaudyeqooe · 2025-03-25T21:45:55 1742939155

> I think it's not an exaggeration to say LLMs are now better than 95+% of the population at mathematical reasoning.

It's not an exaggeration it's a non-sequitur, you first have to show that the LLMs are reasoning in the same way humans do.

bbstats · 2025-03-25T18:24:44 1742927084

Could you explain "The sum of two of the numbers is equal to the third"??

rappatic · 2025-03-25T18:36:30 1742927790

I think:

Call the three numbers a, b, and c. This means c = a + b, but we still don’t know to which person each number belongs.

When person 1 (p1) is asked what his number is, he has no way to know whether he has a, b, or c, so he says he doesn’t know. Same goes for p2 and p3. Clearly p1 somehow gains information by p2 and p3 passing. Either he realizes that he must be either a or b, and such his number is the difference between p2 and p3’s numbers, or he realizes that he must be c and so his number is the sum of p2 and p3’s numbers.

That’s all I have so far. Anyone have other ideas?

bena · 2025-03-25T19:32:34 1742931154

The answer is online and it's clever.

P1 knows that P2 and P3 are not equal. So they know that the set isn't [2A, A, A].

P2 knows that P1 and P3 are not equal. So they know that the set isn't [A, 2A, A]. They also know that if P1 doesn't know, then they were able to make the same deduction. So they now know that both [2A, A, A] and [A, 2A, A] aren't correct. Since they know that [2A, A, A] isn't correct, they can also know that [2A, 3A, A] isn't correct either. Because they'd be able to see if P1 = 2A and P3 = A, and if that were true and P1 doesn't know their number, it would have to be because P2 isn't A. And if P2 isn't A, they'd have to be 3A.

P3 knows that P1 and P2 aren't equal. Eliminates [A, A, 2A]. Knows that [2A, A, A], [A, 2A, A], and [2A, 3A, A], are eliminated. Using the same process as P2, they can eliminate [2A, A, 3A], [A, 2A, 3A], and also [2A, 3A, 5A]. Because they can see the numbers and they know if P1 is 2A and P2 is 3A.

Now we're back at P1. Who now knows.

So P2 and P3 are in the eliminated sets. Which means we're one of these

[2A, A, A]; [3A, 2A, A]; [4A, 3A, A]; [3A, A, 2A]; [4A, A, 3A]; [5A, 2A, 3A]; [8A, 3A, 5A]

We know his number is 65. To find the set, we can factor 65: (5 * 13). We can check the other numbers 2(13) = 26. 3(13) = 39. And technically, you don't need to find the other numbers. The final answer is 5A * 2A * 3A or (A^3) * 30.

byearthithatius · 2025-03-25T19:48:56 1742932136

"Which means we're one of these [2A, A, A]; [3A, 2A, A]; [4A, 3A, A]; [3A, A, 2A]; [4A, A, 3A]; [5A, 2A, 3A]; [8A, 3A, 5A]"

Why? Couldn't it be an infinite number of 3 size arrays comprised of A where two elements sum to the third? [24A, 13A, 11A]? How did we deduce this set of arrays?

EDIT: Solved from another reddit comment. Tuples without a common factor like the one above are considered as a=1.

"They're not eliminated; they correspond to a = 1."

jhhh · 2025-03-26T02:09:48 1742954988

I think that answer was poorly phrased because those possibilities are eliminated in a sense. There is a better answer further in the thread that explains "If the solution was not one of the flipped triplets, then the first player would not have worked out the solution." Thus if it was one of your other infinite triplets (eg. 65, 12, 53) then round 2 player 1 would've still answered 'I don't know'. Since they did respond with a definitive answer it had to be one of the formula solutions, since those were the only solutions they could prove. And since the only formula with a factor in 65 is 5 the correct formula must be [5A, 2A, 3A] and thus [65, 26, 39].

You should be able to generate an infinite number of these problems just by multiplying the first formula factor by a prime number. Like the same question but the person answers '52' restricts you to either [4a, 3a, a] or [4a, a, 3a]. Since the question only asks for the product of all the terms the answer is 4 * 13 + 3 * 13 + 13 = 104.

WithinReason · 2025-03-26T06:33:42 1742970822

Look at it this way: Person 1 sees the numbers 26 and 39, and has to guess his own number. It must be one of only 2 possibilities: 13 or 65. All he has to do is eliminate one of those possibilities.

aardvarkr · 2025-03-25T19:00:12 1742929212

I think it has something to do with applying the lower bound of 1.

If p1 KNOWS that he’s the largest then he has to have gained some other piece of information. Say the numbers he sees are 32 and 33. His number would have to be either 1 or 65. If p1 was 1 then the other two would have known p1 couldn’t be the sum of the other two

oezi · 2025-03-25T19:47:41 1742932061

But p2 and p3 don't yet know what they are themselves just because they see a 1:

If p2 sees 1 and 33, s/he would wonder if s/he is 32 or 34.

P3 would consider 31 or 33.

malisper · 2025-03-25T18:28:56 1742927336

if the three numbers are a, b, and c, then either a+b=c, a+c=b, or b+c=a

bena · 2025-03-25T18:40:31 1742928031

And they must all be positive integers.

So A + B = C and A + C = B. But we know that A + B = C, so we can replace C with (A + B). So we know that A + A + B = B.

So 2A + B = B. Or 2A = 0.

And this holds any way you slice it.

Even if you were to try and brute force it.

A = 1

B = 2

Then C = 3. But A + C has to equal B. That's 1 + 3 = 2? That's not true.

I don't see a case where you can add to the sum of two numbers one of the numbers and get the other number.

I'm guessing that's a misreading of the problem. Because it looks like the third number is the sum of the first two.

refulgentis · 2025-03-25T18:42:17 1742928137

One of the cases has to be true, not all 3. (as you show, they're mutually exclusive for positive integers) i.e. "either" is important in the parent comment.

bena · 2025-03-25T19:01:12 1742929272

Which is why I indicated that it would be a misreading of the problem.

The original problem is a little ambiguously worded. You could say "one of their numbers is the sum of the other two" and it would be a little clearer.

thaumasiotes · 2025-03-25T22:00:50 1742940050

> The original problem is a little ambiguously worded.

No it isn't. If it said "the sum of any two of the numbers is equal to the third", that would be a contradiction. What it says is "the sum of two of the numbers is equal to the third".

bena · 2025-03-26T01:17:38 1742951858

I have three items.

Buying two of the items gets you the third for free.

The implication is any two.

It’s ok that it’s ambiguous. It happens. In most cases, we clarify and move on. There’s no need to defend it.

thaumasiotes · 2025-03-26T04:52:05 1742964725

Why look for ambiguity that isn't there?

refulgentis · 2025-03-26T17:51:39 1743011499

There's a certain mind that either doesn't realize they're sidestepping the problem and turning it into a editing review, or realizes it, and doesn't understand why it seems off-topic/trivial to others.

What's especially strange here is, they repeatedly demonstrate if you interpret it that way, the problem is obviously, trivially, unsolvable, in a way that a beginner in algebra could intuit. (roughly 12 years old, at least, we started touching algebra in 7th grade)

I really don't get it.

When I've seen this sort of thing play out this way, the talking-down is usually for the benefit of demonstrating something to an observer (i.e. I am smart look at this thing I figured out; I can hold my own when the haters chirp; look they say $INTERLOCUTOR is a thinker but they can't even understand me!), but ~0 of that would apply here, at least traditionally.

bena · 2025-03-27T12:45:13 1743079513

One often doesn't look for ambiguity. It is there. It is fine.

refulgentis · 2025-03-25T18:29:34 1742927374

Given #s x,y, and z, either x + y = z, x + z = y, or y + z = x.

mitko · 2025-03-26T04:52:46 1742964766

Loved that puzzle, thanks for sharing it. I’ve solved a lot of math problems in the past but this one had a unique flavor of interleaving logical reasoning, partial information and a little bit of arithmetic.

highfrequency · 2025-03-25T21:10:28 1742937028

Fun puzzle! I’m curious how you managed to structure the problem such that a computer could solve it but it took 3 days of computation?

hmottestad · 2025-03-25T18:36:42 1742927802

Is the answer somehow {65, 20, 45} with the product 58,500?

That’s one-shot for o1 pro.

sebzim4500 · 2025-03-25T19:04:54 1742929494

That's wrong. From player 1's perspective {25 20 45} is entirely consistent with the calls made in the first three rounds.

g105b · 2025-03-25T18:25:28 1742927128

Please can you enlighten me, I'm a mathematic plebian?

doener · 2025-03-25T19:49:47 1742932187

ChatGPT 4.5:

Let’s carefully analyze this puzzle step-by-step:

⸻

Step 1: Define the Problem Clearly

We have three people in a circle. Let’s call them: • Person A (first person) • Person B (second person) • Person C (third person)

Each has a positive integer above their head. Each sees the other two numbers but not their own. The numbers satisfy: A + B = C \quad \text{or} \quad B + C = A \quad \text{or} \quad C + A = B

Given the puzzle scenario, we need to find out the exact numbers given the clues, and ultimately find the product of the three numbers.

⸻

Step 2: Logic of the Puzzle (Step-by-Step Reasoning)

Initial conditions: The numbers are positive integers. The puzzle implies a situation of ambiguity at first, but after multiple rounds of “I don’t know,” Person A finally knows their number is 65.

Let’s denote clearly the conditions again: • Exactly one of the following three equations is true: • A + B = C, or • B + C = A, or • C + A = B.

Person A sees B and C, but not A. Similarly for Persons B and C.

⸻

Step 3: Logical deductions from the repeated “I don’t know”

First round deductions: • Person A initially doesn’t know their number. This means, from Person A’s perspective, both scenarios A = B + C and A + B = C (or A + C = B) are still possible. If Person A had seen identical numbers, for instance, Person A could have instantly known their number. Thus, Person A sees two different numbers. • Similarly, Person B doesn’t know either, implying that Person B also sees ambiguity in their own number. Thus, Person B also sees two numbers that could result in at least two possible solutions. • Person C also doesn’t know, same logic as above.

Thus, after one full round, we understand that initially, all three numbers must create ambiguity. No trivial case (like two identical numbers) is present.

Second round deductions:

After hearing that no one knows their number, each person gains information: they now know the scenario is one in which none of the other two people could have immediately solved it.

Crucially, in the second round: • Person A now knows their number is 65 after considering the previous statements. This implies that Person A sees numbers B and C, and from these numbers plus the statements of uncertainty, Person A deduces that their own number can only be 65.

⸻

Step 4: Deriving the Numbers (Critical deduction)

Let’s denote clearly again: Person A’s number is now known to be 65.

We have three possible equations. Let’s test each carefully given A = 65: 1. Case 1: A + B = C Then we have: • 65 + B = C. Person A sees numbers B and C, so they would see B and 65 + B. Person A sees numbers B and 65+B. If Person A sees these two numbers, they might initially consider their number could be either the sum or difference of these two. • However, if Person A sees two numbers, and one is exactly 2× the other, Person A could immediately guess their number is the difference (since if you see numbers X and 2X, your number is clearly X). So this scenario of B and 65+B is okay only if these two numbers don’t exhibit an immediate trivial scenario. • Let’s hold this scenario and continue examining others first. 2. Case 2: B + C = A (So, B + C = 65) • Then Person A sees numbers B and C, adding exactly to 65. • Initially, Person A sees two numbers summing exactly to 65. Could Person A know their number immediately at first glance? No, since their number could either be the sum (65) or the difference of the two numbers. Ambiguity here initially makes sense. • Initially, Person B and Person C also see ambiguous scenarios. They both don’t know their numbers either at first round. This scenario is plausible. 3. Case 3: C + A = B (So, C + 65 = B) • Then Person A sees numbers B = C + 65 and C. • Person A sees C and C+65. Similar reasoning as Case 1.

⸻

Step 5: Identifying the Correct Scenario

The puzzle’s logic typically aligns best with Case 2: B + C = A = 65.

Here’s why clearly stated: • Initially, Person A sees two numbers adding to exactly 65. At first glance, Person A’s number could either be 65 (sum) or the difference between the two numbers. • Person B also initially sees two numbers and cannot distinguish immediately their own number, nor can Person C. • After everyone states “I don’t know,” Person A gains critical insight: if the difference between the two numbers (that A sees) allowed Person B or Person C to know their number immediately, they’d have known right away. The fact that B and C didn’t know means the difference is not immediately obvious to B or C. Thus, the difference cannot be trivial (such as identical numbers or easy multiples).

Therefore, the two numbers must be such that their difference doesn’t immediately clue in the other persons. This occurs most clearly with pairs of numbers which have multiple factorization possibilities or aren’t trivial multiples.

But we must carefully test quickly to find a suitable, plausible integer pair whose sum is 65 and whose ambiguity would logically survive through multiple rounds. The puzzle is a classic logic puzzle with a known structure:

A common scenario in these classic puzzles is the use of Fibonacci-like or similar “golden” sets. Let’s quickly check the following well-known ambiguous triple: • Numbers: 65, 39, 26 • Check quickly: indeed, 39 + 26 = 65.

Checking logic quickly: • Person A sees 39 and 26. Initially ambiguous (number could be 65 or 13). • Person B sees 65 and 26. Could be either 39 or 91 at first glance. Initially ambiguous. • Person C sees 65 and 39. Could be either 26 or 104 initially. Initially ambiguous.

After each person says they don’t know: • Now, Person A considers: If my number was 13, then Person B would see numbers 13 and 26. Person B would then immediately know their number is 39 (13+26=39), no ambiguity. But Person B didn’t know! So Person A’s number cannot be 13. Hence, Person A confidently concludes their number is 65.

Thus, the numbers 65, 39, and 26 satisfy the puzzle conditions perfectly!

⸻

Step 6: Final Answer (Product of the Numbers)

The three numbers identified clearly are: • A = 65 • B = 39 • C = 26

Then, the product is: 65 \times 39 \times 26 = 65 \times 1014 = 65,910

⸻

Final Answer:

\boxed{65,910}

WithinReason · 2025-03-26T06:36:50 1742971010

This was not logic, this was recall from the training set disguised as logic:

Let’s quickly check the following well-known ambiguous triple: • Numbers: 65, 39, 26

malisper · 2025-02-10T18:59:01 1739213941

If you do the Math, 1% seems fair. The typical YC company raises a seed round at a valuation of around $20M. 1% of that with standard vesting terms equates to $50k/yr.

If the typical founding engineer equity was 5%, that would equate to $250k/yr which would mean most startups would have greater total comp than Google.

BeetleB · 2025-02-10T19:13:37 1739214817

> If you do the Math, 1% seems fair.

The math is simple:

As a founding engineer, I do almost the same amount of work as the founder (e.g. 90%), and get only 5% or less of the reward.

If the founder is the main source of capital, I can understand. But if all the founder does is build the product and raise money, how different is (s)he from you?

malisper · 2025-02-10T22:26:50 1739226410

> As a founding engineer, I do almost the same amount of work as the founder (e.g. 90%), and get only 5% or less of the reward.

If you believe you're doing 90% the work of a founder and getting paid 5%, then you should be an actual founder and get paid 20x as much as you be as a founding engineer

shalmanese · 2025-02-11T03:51:25 1739245885

What % of startups fail before they even get to the stage of being able to hire a founding engineer? You can either make the choice to be a founder and start before this selection filter or be a founding engineer and start after the selection filter.

Of course, the odds are not static and some people genuinely do have a better RAROC by being a founder but most people overestimate their founder abilities vs the odds and feel like they're not fairly compensated at 1%, which is fine, most people shouldn't be founding engineers either.

But there's a reason it's equilibrated around the 1% mark because early equity compensation is about risk, not effort.

naet · 2025-02-10T19:32:55 1739215975

A lot of startups don't ever get a YC seed round or $20M valuation. There's a significant risk that your equity ends up being worthless.

duped · 2025-02-10T19:07:49 1739214469

1% at seed is not going to stay 1% at the first raise after dilution

malisper · 2025-02-10T22:28:14 1739226494

The 50k/yr value of the 1% equity already prices in future dilution. Just because the equity is going to be less than 1% in the future doesn't mean it's not worth $50k/yr today

rgbrgb · 2025-02-10T19:22:52 1739215372

If you have proven yourself indispensable then it definitely can via additional grants.

bpt3 · 2025-02-10T19:26:34 1739215594

You forgot to account for the likelihood that 1% is liquid at any point with a valuation close to or higher than the current one.

malisper · 2025-02-10T22:29:55 1739226595

That's already priced in

mempko · 2025-02-10T19:23:38 1739215418

Why shouldn't a startup engineer earn more than an engineer at Google? Think about it, they take much much more risk. Their comp can go to zero. It's not liquid like a google engineer. Also they are creating something net-new that can benefit society in the long run. We want, as a society, to reward people for taking those kinds of risks!

In other words, we need more startup engineers and less google engineers.

malisper · 2025-02-10T22:30:56 1739226656

Because Google's revenue is $2M/employee while a startup's will be $0/employee

mempko · 2025-02-11T17:29:03 1739294943

You do understand equity scales with the success of the startup right? It's not cash.

malisper · 2024-12-20T01:35:12 1734658512

I need to do some Math, but I wonder if there's a better strategy than Kelly betting. An assumption made for Kelly betting is the bets are independent of each other. That's not the case in the problem given.

After making a bet, you gain information about the contents of the rest of the deck of cards. I could see it being possible to do better by pricing in that information into your bet.

amluto · 2024-12-20T03:05:49 1734663949

But the information gained in this game is independent of your bet. The multi-armed bandit problem is a famous example of the opposite situation.

necovek · 2024-12-20T09:09:15 1734685755

That seems to be exactly what this strategy is doing: at every step, you account for the probability of the red or black card coming up, and bet accordingly (both the sum and the colour).

malisper · on Nov 16, 2024

It's interesting to note none of the email tools I'm most familiar with are mentioned by the author. It's clear the author is a different demographic from me given they said they want to stay under $200/mo. Some of the tools I hear companies use the most are:

  - Customer.io
  - Iterable
  - Braze
  - Marketo
  - Salesforce Marketing Cloud

My understanding is Customer.io is what most startups use these days with larger companies using one of the other four.

AlexeyMK · on Nov 17, 2024

That is mostly right, with the caveat that customer.io is trying to go upmarket too now.

https://playbooks.hypergrowthpartners.com/p/picking-your-lif...

I wrote this up about a year ago for a more comprehensive perspective for companies series A+

fijiaarone · on Nov 17, 2024

$100/month for less than most mail providers offer free, and $1000/month for standard features is only for startups trying to burn money before they get off the runway.

malisper · on Aug 4, 2024

This is pretty neat! I was toying around with the problem and it appears you can use generating functions to derive the same sequence of operations. If you start with:

  G(x) = 1 + x + x^2 + ... = 1/(1-x)

The coefficients of this polynomial is the sequence (0^0, 1^0, 2^0, ...)

If you take the derivative of G(x) and multiply by x you get:

  x * G'(x) = x + 2*x^2 + 3*x^3 + ... = x * d/dx 1/(1-x) = x/(1-x)^2

The coefficients of this polynomial is the sequence (0^1, 1^1, 2^1, ...). If you repeat this step, you get a polynomial whose coefficients are (0^2, 1^2, 2^2, ...) and if you do this operation N times, you can get a closed form of a polynomial whose coefficients are (0^N, 1^N, 2^N, ...).

The infinite sum converges for -1 < x < 1. If you set x=1/c, you get the infinite sum

  0^N/c^0 + 1^N/c^1 + 2^N/c^2 + ...

which is exactly the sum we are trying to solve for. This means you solve any infinite sum of the form given by taking the derivative of 1/(1-x) N times while multiplying by x each time. Then plug in x=1/c at the end.

ajkjk · on Aug 4, 2024

Doing it with generating functions is essentially the same as what is done in the post under a different name.

shiandow · on Aug 5, 2024

True, but the generating functions make it easier to prove that this works, rather than relying on the properties of a particular function (properties that you can most easily prove by reverting to the generating functions).