Three consecutive months of decline starts to look more like a trend. Unless you think there's a transient issue causing the decline, something fundamental has changed
Again: compare early 2024. And that’s not the only thing; the second chart shows a possible flattening, but by no means certain yet, especially not when taken with the clear March–April jump; and the first chart shows no dwindling in 1–4, and clear recovery in 250+. The lie is easily put to the claim the article makes:
> Data from the Census Bureau and Ramp shows that AI adoption rates are starting to flatten out across all firm sizes, see charts below.
It’s flat-out nonsense, and anyone with any experience in this kind of statistics can see it.
From the chart, the percentage of companies using AI has been going down over the past couple of months
That's a massive deal because the AI companies today are valued on the assumption that they'll 10x their revenue over the next couple of years. If their revenue growth starts to slow down, their valuations will change to reflect that
This bubble phase will play out just as the previous have in tech: consolidation, most of the value creation will go to a small group of companies. Most will die, some will thrive.
Companies like Anthropic will not survive as an independent. They won't come close to having enough revenue & profit to sustain their operating costs (they're Lyft to Google or OpenAI's Uber, Anthropic will never reach the scale needed to roll over to significant profit generation). Its fair value is 1/10th or less what it's being valued at currently (yes because I say so). Anthropic's valuation will implode to reconcile that, as the market for AI does. Some larger company will scoop them up during the pain phase, once they get desperate enough to sell. When the implosion of the speculative hype is done, the real value creation will begin thereafter. Over the following two or three decades a radical amount of value will be generated by AI collectively, far beyond anything seen during this hype phase. A lot of lesser AI companies will follow the same path as Anthropic.
To be fair to OpenAI, their privacy policy[0] does provide some detail. They don't mention Mixpanel explicitly, but OpenAI does mention they share your information with third-party web analytics services:
> To assist us in meeting business operations needs and to perform certain services and functions, we may disclose Personal Data to vendors and service providers, including providers of ... web analytics services ...
OpenAI likely provides this disclosure to comply with US state privacy laws, but it's inaccurate to say they didn't disclose that they won't share your information
> Gemini exfiltrates the data via the browser subagent: Gemini invokes a browser subagent per the prompt injection, instructing the subagent to open the dangerous URL that contains the user's credentials.
fulfills the requirements for being able to change external state
I disagree. No state "owned" by LLM changed, it only sent a request to the internet like any other.
EDIT: In other words, the LLM didn't change any state it has access to.
To stretch this further - clicking on search results changes the internal state of Google. Would you consider this ability of LLM to be state-changing? Where would you draw the line?
> Also, ARC AGI reported they've been unable to independently replicate OpenAI's claimed breakthrough score from December
Can you elaborate on this? Where did ARC AGI report that? From ARC AGI[0]:
> ARC Prize Foundation was invited by OpenAI to join their “12 Days Of OpenAI.” Here, we shared the results of their first o3 model, o3-preview, on ARC-AGI. It set a new high-water mark for test-time compute, applying near-max resources to the ARC-AGI benchmark.
> We announced that o3-preview (low compute) scored 76% on ARC-AGI-1 Semi Private Eval set and was eligible for our public leaderboard. When we lifted the compute limits, o3-preview (high compute) scored 88%. This was a clear demonstration of what the model could do with unrestricted test-time resources. Both scores were verified to be state of the art.
That makes it sound like ARC AGI were the ones running the original test with o3
What they say they haven't been able to reproduce is o3-preview's performance with the production versions of o3. They attribute this to the production versions being given less compute than the versions they ran in the test
I've been using a math puzzle as a way to benchmark the different models. The math puzzle took me ~3 days to solve with a computer. A math major I know took about a day to solve it by hand.
Gemini 2.5 is the first model I tested that was able to solve it and it one-shotted it. I think it's not an exaggeration to say LLMs are now better than 95+% of the population at mathematical reasoning.
For those curious the riddle is: There's three people in a circle. Each person has a positive integer floating above their heads, such that each person can see the other two numbers but not his own. The sum of two of the numbers is equal to the third. The first person is asked for his number, and he says that he doesn't know. The second person is asked for his number, and he says that he doesn't know. The third person is asked for his number, and he says that he doesn't know. Then, the first person is asked for his number again, and he says: 65. What is the product of the three numbers?
That's a non-sequitur, they would be stupid to run ab expensive _L_LM for every search query. This post is not about Google Search being replaced by Gemini 2.5 and/or a chatbot.
Bing doesn't list any reddit posts (that Google-exclusive deal) so I'll assume no stackexchange-related sites have an appropriate answer (or bing is only looking for hat-related answers for some reason).
I might have been phrasing poorly. With _L_ (or L as intended), I meant their state-of-the-art model, which I presume Gemini 2.5 is (didn't come around to TFA yet). Not sure if this question is just about model size.
I'm eagerly awaiting an article about RAG caching strategies though!
There's 3 toddlers on the floor. You ask them a hard mathematical question. One of the toddlers plays around pieces of paper on the ground and happens to raise one that has the right answer written on it.
- This kid is a genius! - you yell
- But wait, the kid has just picked an answer from the ground, it didn't actually come up...
- But the other toddlers could do it also but didn't!
Other models aren't able to solve it so there's something else happening besides it being in the training data. You can also vary the problem and give it a number like 85 instead of 65 and Gemini is still able to properly reason through the problem
I'm sure you're right that it's more than just it being in the training data, but that it's in the training data means that you can't draw any conclusions about general mathematical ability using just this as a benchmark, even if you substitute numbers.
There are lots of possible mechanisms by which this particular problem would become more prominent in the weights in a given round of training even if the model itself hasn't actually gotten any better at general reasoning. Here are a few:
* Random chance (these are still statistical machines after all)
* The problem resurfaced recently and shows up more often than it used to.
* The particular set of RLHF data chosen for this model draws out the weights associated with this problem in a way that wasn't true previously.
Sure, but you can't cite this puzzle as proof that this model is "better than 95+% of the population at mathematical reasoning" when the method of solving (the "answer") it is online, and the model has surely seen it.
Thaks. I wanted to do exactly that: find the answer online. It is amazing that people (even in HN) think that LLM can reason. It just regurgitates the input.
I think it can reason. At least if it can work in a loop ("thinking"). It's just that this reasoning is far inferior to human reasoning, despite what some people hastily claim.
I would say maybe about 80% certainly not 99.99%. But I've seen that in college, some would only be able to solve the problems which were pretty much the same as others already seen. Notably some guys could easily come up with solutions to complex problems they did not see before. I have the opinion that no human at age 20 can have the amount of input a LLM today. And still humans of age 20 do come with very new ideas pretty often (new in the sense that (s)he has not seen that or anything like it before). Of course there are more and less creative/intelligent people...
Is there a reason for the downvotes here? We can see that having the answer in the training data doesn't help. If it's in there, what's that supposed to show?
It's entirely unclear what are you trying to get across, at least to me.
Generally speaking, posting output from a LLM, without explaining exactly what do you think it illustrates and why is frowned upon here. I don't think your comment does a great job of the latter.
>> So it’s likely that it’s part of the training data by now.
> I don't think this means what you think it means.
> I did some interacting with the Tencent model that showed up here a couple days ago [...]
> This is a question that obviously was in the training data. How do you get the answer back out of the training data?
What do I think the conversation illustrates? Probably that having the answer in the training data doesn't get it into the output.
How does the conversation illustrate that? It isn't subtle. You can see it without reading any of the Chinese. If you want to read the Chinese, Google Translate is more than good enough for this purpose; that's what I used.
Your intentions are good, but your execution is poor.
I cannot figure out what the comment is trying to get across either. It's easy for you because you already know what you are trying to say. You know what the pasted output shows. The poor execution is in not spending enough time thinking about how someone coming in totally blind would interpret the comment.
I have translated the Chinese. I still have no idea what point you're trying to make. You ask it questions about some kind of band, and it answers. Are you saying the answers are wrong?
I didn't downvote you, but like (probably) most people here, I can't read Chinese; I can't derive whatever point you're trying to make just from with text you provided.
This is solvable in roughly half an hour on pen and paper by a random person I picked with no special math skills (beyond a university). This is far from a difficult problem. The "95%+" in math reasoning is a meaningless standard, it's like saying a model is better than 99.9% of world population in Albanian language, since less than 0.1% bother to learn Albanian.
Even ignoring the fact that this or similar problem may have appeared in the training data, it's something a careful brute-force math logic should solve. It's neither difficult, nor interesting, nor useful. Yes, it may suggest a slight improvement on the basic logic, but no more so than a million other benchmarks people quote.
This goes to show that evaluating models is not a trivial problem. In fact, it's a hard problem (in particular, it's a far far harder than this math puzzle).
The "random person" you picked is likely very, very intelligent and not at all a good random sample. I'm not saying this is difficult to the extent that it merits academic focus, but it is NOT a simple problem and I suspect less than 1% of the population could solve this in half an hour "with no special math skills." You have to be either exceedingly clever or trained in a certain type of reasoning or both.
I agree with your general point that this "random person" is probably not representative of anything close to an average person off the street, but I think the phrasing "very very intelligent" and "exceedingly clever" is kinda misleading.
In my experience, the difference between someone who solves this type of logic puzzle and someone who doesn't, has more to do with persistence and ability to maintain focus, rather than "intelligence" in terms of problem-solving ability per se. I've worked with college students helping them learn to solve these kinds of problems (eg. as part of pre-interview test prep), and in most cases, those who solve it and those who don't have the same rate of progress towards the solution as long as they're actively working at it. The difference comes in how quickly they get frustrated (at themselves mostly), decide they're not capable of solving it, and give up on working on it further.
I mention this because this frustration itself comes from a belief that the ability to solve these belongs some "exceedingly clever" people only, and not someone like them. So, this kind of thinking ends up being a vicious cycle that keeps them from working on their actual issues.
I solved it in less than 15 minutes while walking my dog, no pen or paper. But I wouldn't claim to be a random person without math skills. And my very first guess was correct.
It was a fun puzzle though and I'm surprised I didn't know it already. Thanks for sharing.
So in the three hours between you reading the puzzle in the parent comment, you stopped what you were doing, managed to get some other "random" person to stop what they were doing and spend half an hour of their time on a maths puzzle that at that point prior experience suggested could take a day? All within three hours?
That's not to say that you didn't, or you're recalling from a previous time that happens to be this exact puzzle (despite there being scant prior references to this puzzle, and precisely the reason for using it). But you can see how some might see that as not entirely credible.
Best guess: this random person is someone that really likes puzzles, is presumably good at them and is very, very far from being representative to the extent you would require to be in support of your argument.
> This is solvable in roughly half an hour on pen and paper by a random person I picked with no special math skills (beyond a university).
I randomly answered this post and can't solve it in half an hour. Is the point leet code but for AI? I rather it solve real problems than "elite problems".
Side note: couldn't even find pen and paper around in half an hour.
This is a great riddle. Unfortunately, I was easily able to find the exact question with a solution (albeit with a different number) online, thus it will have been in the training set.
What makes this interesting is that while the question is online (on reddit, from 10 years ago) other models don't get the answer right. Gemini also shows it's work and it seems to do a few orders of magnitude more calculating then the elegant answer given on reddit.
Granted this is all way over my head, but the solution gemini comes to matches the one given on reddit (and now here in future training runs)
>Gemini also shows it's work and it seems to do a few orders of magnitude more calculating then the elegant answer given on reddit.
I don't think Gemini does an unnecessary amount of computation, it's just more verbose. This is typical of reasoning models, almost every step is necessary but many would not be written down by a human.
everyone with limited bandwidth has been trying to limit site access to robots. the latest generation of AI web scrapers are brutal and do not respect robots.txt
There are websites where you can only register to in person and have two existing members vouch for you. Probably still can be gamed, but sounds like a great barrier to entry for robots (for now).
Admins will see unusual traffic from that account and then take action. Of course it will not be perfect as there could be a way to mimic human traffic and slowly scrape the data anyway, that's why there is element of trust (two existing members to vouch).
Yeah don’t get me wrong I believe raising the burden of extraction is an effective strategy I just think it’s been solved at scale ie voting rings and astro turfing operations on Reddit - and at the nation state level I’d just bribe or extort the mods and admins directly (or the IT person to dump the database).
I have bad news for you if you think non paywalled / non phone# required discord communities are immune to AI scraping, especially as it costs less than hammering traditional websites as the push-on-change event is done for you in real time chat contexts.
Especially as the company archives all those chats (not sure how long) and is small enough that a billion dollar "data sharing" agreement would be a very inticing offer.
If there isn't a significant barrier to access, it's being scraped. And if that barrier is money, it's being scraped but less often.
I'm not sure what you mean but I'm trying to say our current LLMs are not artificially intelligent and calling them "AI" has confused a lot of the lay public.
Why is this a great riddle? It sounds like incomplete nonsense to me:
It doesnt say anything about the skill levels of the participants, whether their answers are just guessing, or why they arent just guessing the sum of the other two people each time asked to provide more information?
It doesnt say the guy saying 65 is even correct
How could three statements of "no new information" give information to the first guy that didn't know the first time he was asked?
2 and 3 saying they don't know eliminates some uncertainties 1 had about their own number (any combination where the other two would see numbers that could tell them their own). After those possibilities were eliminated, the 1st person has narrowed it down enough to actually know based on the numbers shown above the other 2. The puzzle could instead have been done in order 2, 3, 1 and 1 would not have needed to go twice.
I guess really the only missing information is that they have the exact same information you do, plus the numbers above their friends heads.
You'd have better results if you had prompted it with the actual answer and asked how the first person came to the conclusion. Giving a number in the training set is very easy.
i.e. You observe three people in a magical room. The first person is standing underneath a 65, the second person is standing underneath a 26 and the third person is standing underneath a 39. They can see the others numbers but not the one they are directly under. You tell them one of the three numbers is the sum of the other two and all numbers are positive integers. You ask the first person for their number, they respond that they don't know. You ask the second person for their number, they respond that they don't know. You ask the third person, they respond that they don't know. You ask the first person again and they respond with the correct value, how did they know?
In general I find commentary here too negative on AI, but I'm a bit squeamish about maximalist claims re: AI mathematical reasoning vs. human population based off this, even setting aside lottery-ticket-hypothesis-like concerns.
Same here: My problem of choice is the 100 prisoners problem [1]. I used to ask simple reasoning questions in the style of "what is the day three days before the day after tomorrow", but nowadays when I ask such questions, I can almost feel the the NN giggling at the naivety of its human operator.
Deepseek R1 got the right answer after a whopping ~10 minutes of thinking. I'm impressed and feel kind of dirty, I suspect my electricity use from this could have been put to better use baking a frozen pizza.
You can also put the AI in the first person's shoes.
Prompt:
You are standing in a circle, there are 2 other people in the circle with you, everyone in the circle, has a positive integer above their head, no one knows what the number above their own head is but can see the numbers above the heads of the other people. You see that the person infront of you on the left has 26 above their head. The person on the right has 39 above their head. You are told that the sum of two of the numbers is the third number. You are asked what the number above your head is, the option is the sum, 65, or 13, as 26 + 13 = 39. You don't know which one it is, and you say so. The second person is asked the number above their head. They also say they dont know, the third person also says they dont know. What is your number?
Gemini 2.5 and claude 3.7 thinking get it right, o3 mini and 4o get it wrong
I just asked it this twice and it gave me 65×65×130=549250. Both times. The first time I made it about ducks instead of people and mentioned that there was a thunderstorm. The second time I c/p your exact text and it gave me the same answer.
Again we find that the failure state of LLMs is a problem – yeah, when you know the answer already and it gets it right, that's impressive! When it fails, it still acts the same exact way and someone who doesn't already know the answer is now a lil stupider.
I use an algorithmic question that I'd been working on for years and that I'm finally writing up the answer to.
It's basically: given a sequence of heap operations (insert element, delete minimum element), can you predict the left-over elements (that are in the heap at the end) in linear time in the comparison model?
A prolog program, swipl (it takes less than a second to solve your puzzle)
N is number of turns of don't know answers.
the bad predicate means that the person can know its number at turn N.
bad(_,_,_,-1) :- !,false.
bad(_,A,A,0) :- !.
bad(A,_,A,0) :- !.
bad(A,A,_,0) :- !.
bad(B,C,A,N) :- D is abs(B-A),D<C,N1 is N-1, bad(B,D,A,N1),!.
bad(C,A,B,N) :- D is abs(B-A),D<C,N1 is N-1, bad(D,A,B,N1),!.
bad(A,B,C,N) :- D is abs(B-A),D<C,N1 is N-1, bad(A,B,D,N1),!.
solve(X,Y,Z) :- Y1 is X-1, between(1,Y1,Y),
between(0,2,N), Z is X-Y,bad(X,Y,Z,N).
?- solve(65,X,Y).
X = 26,
Y = 39 ;
X = 39,
Y = 26 .
Call the three numbers a, b, and c. This means c = a + b, but we still don’t know to which person each number belongs.
When person 1 (p1) is asked what his number is, he has no way to know whether he has a, b, or c, so he says he doesn’t know. Same goes for p2 and p3. Clearly p1 somehow gains information by p2 and p3 passing. Either he realizes that he must be either a or b, and such his number is the difference between p2 and p3’s numbers, or he realizes that he must be c and so his number is the sum of p2 and p3’s numbers.
That’s all I have so far. Anyone have other ideas?
P1 knows that P2 and P3 are not equal. So they know that the set isn't [2A, A, A].
P2 knows that P1 and P3 are not equal. So they know that the set isn't [A, 2A, A]. They also know that if P1 doesn't know, then they were able to make the same deduction. So they now know that both [2A, A, A] and [A, 2A, A] aren't correct. Since they know that [2A, A, A] isn't correct, they can also know that [2A, 3A, A] isn't correct either. Because they'd be able to see if P1 = 2A and P3 = A, and if that were true and P1 doesn't know their number, it would have to be because P2 isn't A. And if P2 isn't A, they'd have to be 3A.
P3 knows that P1 and P2 aren't equal. Eliminates [A, A, 2A]. Knows that [2A, A, A], [A, 2A, A], and [2A, 3A, A], are eliminated. Using the same process as P2, they can eliminate [2A, A, 3A], [A, 2A, 3A], and also [2A, 3A, 5A]. Because they can see the numbers and they know if P1 is 2A and P2 is 3A.
Now we're back at P1. Who now knows.
So P2 and P3 are in the eliminated sets. Which means we're one of these
[2A, A, A]; [3A, 2A, A]; [4A, 3A, A]; [3A, A, 2A]; [4A, A, 3A]; [5A, 2A, 3A]; [8A, 3A, 5A]
We know his number is 65. To find the set, we can factor 65: (5 * 13). We can check the other numbers 2(13) = 26. 3(13) = 39. And technically, you don't need to find the other numbers. The final answer is 5A * 2A * 3A or (A^3) * 30.
"Which means we're one of these [2A, A, A]; [3A, 2A, A]; [4A, 3A, A]; [3A, A, 2A]; [4A, A, 3A]; [5A, 2A, 3A]; [8A, 3A, 5A]"
Why? Couldn't it be an infinite number of 3 size arrays comprised of A where two elements sum to the third? [24A, 13A, 11A]? How did we deduce this set of arrays?
EDIT: Solved from another reddit comment. Tuples without a common factor like the one above are considered as a=1.
"They're not eliminated; they correspond to a = 1."
I think that answer was poorly phrased because those possibilities are eliminated in a sense. There is a better answer further in the thread that explains "If the solution was not one of the flipped triplets, then the first player would not have worked out the solution." Thus if it was one of your other infinite triplets (eg. 65, 12, 53) then round 2 player 1 would've still answered 'I don't know'. Since they did respond with a definitive answer it had to be one of the formula solutions, since those were the only solutions they could prove. And since the only formula with a factor in 65 is 5 the correct formula must be [5A, 2A, 3A] and thus [65, 26, 39].
You should be able to generate an infinite number of these problems just by multiplying the first formula factor by a prime number. Like the same question but the person answers '52' restricts you to either [4a, 3a, a] or [4a, a, 3a]. Since the question only asks for the product of all the terms the answer is 4 * 13 + 3 * 13 + 13 = 104.
Look at it this way: Person 1 sees the numbers 26 and 39, and has to guess his own number. It must be one of only 2 possibilities: 13 or 65. All he has to do is eliminate one of those possibilities.
I think it has something to do with applying the lower bound of 1.
If p1 KNOWS that he’s the largest then he has to have gained some other piece of information. Say the numbers he sees are 32 and 33. His number would have to be either 1 or 65. If p1 was 1 then the other two would have known p1 couldn’t be the sum of the other two
One of the cases has to be true, not all 3. (as you show, they're mutually exclusive for positive integers) i.e. "either" is important in the parent comment.
Which is why I indicated that it would be a misreading of the problem.
The original problem is a little ambiguously worded. You could say "one of their numbers is the sum of the other two" and it would be a little clearer.
> The original problem is a little ambiguously worded.
No it isn't. If it said "the sum of any two of the numbers is equal to the third", that would be a contradiction. What it says is "the sum of two of the numbers is equal to the third".
There's a certain mind that either doesn't realize they're sidestepping the problem and turning it into a editing review, or realizes it, and doesn't understand why it seems off-topic/trivial to others.
What's especially strange here is, they repeatedly demonstrate if you interpret it that way, the problem is obviously, trivially, unsolvable, in a way that a beginner in algebra could intuit. (roughly 12 years old, at least, we started touching algebra in 7th grade)
I really don't get it.
When I've seen this sort of thing play out this way, the talking-down is usually for the benefit of demonstrating something to an observer (i.e. I am smart look at this thing I figured out; I can hold my own when the haters chirp; look they say $INTERLOCUTOR is a thinker but they can't even understand me!), but ~0 of that would apply here, at least traditionally.
Loved that puzzle, thanks for sharing it. I’ve solved a lot of math problems in the past but this one had a unique flavor of interleaving logical reasoning, partial information and a little bit of arithmetic.
We have three people in a circle. Let’s call them:
• Person A (first person)
• Person B (second person)
• Person C (third person)
Each has a positive integer above their head. Each sees the other two numbers but not their own. The numbers satisfy:
A + B = C \quad \text{or} \quad B + C = A \quad \text{or} \quad C + A = B
Given the puzzle scenario, we need to find out the exact numbers given the clues, and ultimately find the product of the three numbers.
⸻
Step 2: Logic of the Puzzle (Step-by-Step Reasoning)
Initial conditions:
The numbers are positive integers. The puzzle implies a situation of ambiguity at first, but after multiple rounds of “I don’t know,” Person A finally knows their number is 65.
Let’s denote clearly the conditions again:
• Exactly one of the following three equations is true:
• A + B = C, or
• B + C = A, or
• C + A = B.
Person A sees B and C, but not A. Similarly for Persons B and C.
⸻
Step 3: Logical deductions from the repeated “I don’t know”
First round deductions:
• Person A initially doesn’t know their number. This means, from Person A’s perspective, both scenarios A = B + C and A + B = C (or A + C = B) are still possible. If Person A had seen identical numbers, for instance, Person A could have instantly known their number. Thus, Person A sees two different numbers.
• Similarly, Person B doesn’t know either, implying that Person B also sees ambiguity in their own number. Thus, Person B also sees two numbers that could result in at least two possible solutions.
• Person C also doesn’t know, same logic as above.
Thus, after one full round, we understand that initially, all three numbers must create ambiguity. No trivial case (like two identical numbers) is present.
Second round deductions:
After hearing that no one knows their number, each person gains information: they now know the scenario is one in which none of the other two people could have immediately solved it.
Crucially, in the second round:
• Person A now knows their number is 65 after considering the previous statements. This implies that Person A sees numbers B and C, and from these numbers plus the statements of uncertainty, Person A deduces that their own number can only be 65.
⸻
Step 4: Deriving the Numbers (Critical deduction)
Let’s denote clearly again: Person A’s number is now known to be 65.
We have three possible equations. Let’s test each carefully given A = 65:
1. Case 1: A + B = C
Then we have:
• 65 + B = C. Person A sees numbers B and C, so they would see B and 65 + B. Person A sees numbers B and 65+B. If Person A sees these two numbers, they might initially consider their number could be either the sum or difference of these two.
• However, if Person A sees two numbers, and one is exactly 2× the other, Person A could immediately guess their number is the difference (since if you see numbers X and 2X, your number is clearly X). So this scenario of B and 65+B is okay only if these two numbers don’t exhibit an immediate trivial scenario.
• Let’s hold this scenario and continue examining others first.
2. Case 2: B + C = A (So, B + C = 65)
• Then Person A sees numbers B and C, adding exactly to 65.
• Initially, Person A sees two numbers summing exactly to 65. Could Person A know their number immediately at first glance? No, since their number could either be the sum (65) or the difference of the two numbers. Ambiguity here initially makes sense.
• Initially, Person B and Person C also see ambiguous scenarios. They both don’t know their numbers either at first round. This scenario is plausible.
3. Case 3: C + A = B (So, C + 65 = B)
• Then Person A sees numbers B = C + 65 and C.
• Person A sees C and C+65. Similar reasoning as Case 1.
⸻
Step 5: Identifying the Correct Scenario
The puzzle’s logic typically aligns best with Case 2: B + C = A = 65.
Here’s why clearly stated:
• Initially, Person A sees two numbers adding to exactly 65. At first glance, Person A’s number could either be 65 (sum) or the difference between the two numbers.
• Person B also initially sees two numbers and cannot distinguish immediately their own number, nor can Person C.
• After everyone states “I don’t know,” Person A gains critical insight: if the difference between the two numbers (that A sees) allowed Person B or Person C to know their number immediately, they’d have known right away. The fact that B and C didn’t know means the difference is not immediately obvious to B or C. Thus, the difference cannot be trivial (such as identical numbers or easy multiples).
Therefore, the two numbers must be such that their difference doesn’t immediately clue in the other persons. This occurs most clearly with pairs of numbers which have multiple factorization possibilities or aren’t trivial multiples.
But we must carefully test quickly to find a suitable, plausible integer pair whose sum is 65 and whose ambiguity would logically survive through multiple rounds. The puzzle is a classic logic puzzle with a known structure:
A common scenario in these classic puzzles is the use of Fibonacci-like or similar “golden” sets. Let’s quickly check the following well-known ambiguous triple:
• Numbers: 65, 39, 26
• Check quickly: indeed, 39 + 26 = 65.
Checking logic quickly:
• Person A sees 39 and 26. Initially ambiguous (number could be 65 or 13).
• Person B sees 65 and 26. Could be either 39 or 91 at first glance. Initially ambiguous.
• Person C sees 65 and 39. Could be either 26 or 104 initially. Initially ambiguous.
After each person says they don’t know:
• Now, Person A considers: If my number was 13, then Person B would see numbers 13 and 26. Person B would then immediately know their number is 39 (13+26=39), no ambiguity. But Person B didn’t know! So Person A’s number cannot be 13. Hence, Person A confidently concludes their number is 65.
Thus, the numbers 65, 39, and 26 satisfy the puzzle conditions perfectly!
⸻
Step 6: Final Answer (Product of the Numbers)
The three numbers identified clearly are:
• A = 65
• B = 39
• C = 26
Then, the product is:
65 \times 39 \times 26
= 65 \times 1014
= 65,910
If you do the Math, 1% seems fair. The typical YC company raises a seed round at a valuation of around $20M. 1% of that with standard vesting terms equates to $50k/yr.
If the typical founding engineer equity was 5%, that would equate to $250k/yr which would mean most startups would have greater total comp than Google.
As a founding engineer, I do almost the same amount of work as the founder (e.g. 90%), and get only 5% or less of the reward.
If the founder is the main source of capital, I can understand. But if all the founder does is build the product and raise money, how different is (s)he from you?
> As a founding engineer, I do almost the same amount of work as the founder (e.g. 90%), and get only 5% or less of the reward.
If you believe you're doing 90% the work of a founder and getting paid 5%, then you should be an actual founder and get paid 20x as much as you be as a founding engineer
What % of startups fail before they even get to the stage of being able to hire a founding engineer? You can either make the choice to be a founder and start before this selection filter or be a founding engineer and start after the selection filter.
Of course, the odds are not static and some people genuinely do have a better RAROC by being a founder but most people overestimate their founder abilities vs the odds and feel like they're not fairly compensated at 1%, which is fine, most people shouldn't be founding engineers either.
But there's a reason it's equilibrated around the 1% mark because early equity compensation is about risk, not effort.
The 50k/yr value of the 1% equity already prices in future dilution. Just because the equity is going to be less than 1% in the future doesn't mean it's not worth $50k/yr today
Why shouldn't a startup engineer earn more than an engineer at Google? Think about it, they take much much more risk. Their comp can go to zero. It's not liquid like a google engineer. Also they are creating something net-new that can benefit society in the long run. We want, as a society, to reward people for taking those kinds of risks!
In other words, we need more startup engineers and less google engineers.
I need to do some Math, but I wonder if there's a better strategy than Kelly betting. An assumption made for Kelly betting is the bets are independent of each other. That's not the case in the problem given.
After making a bet, you gain information about the contents of the rest of the deck of cards. I could see it being possible to do better by pricing in that information into your bet.
That seems to be exactly what this strategy is doing: at every step, you account for the probability of the red or black card coming up, and bet accordingly (both the sum and the colour).
It's interesting to note none of the email tools I'm most familiar with are mentioned by the author. It's clear the author is a different demographic from me given they said they want to stay under $200/mo. Some of the tools I hear companies use the most are:
$100/month for less than most mail providers offer free, and $1000/month for standard features is only for startups trying to burn money before they get off the runway.
This is pretty neat! I was toying around with the problem and it appears you can use generating functions to derive the same sequence of operations. If you start with:
G(x) = 1 + x + x^2 + ... = 1/(1-x)
The coefficients of this polynomial is the sequence (0^0, 1^0, 2^0, ...)
If you take the derivative of G(x) and multiply by x you get:
x * G'(x) = x + 2*x^2 + 3*x^3 + ... = x * d/dx 1/(1-x) = x/(1-x)^2
The coefficients of this polynomial is the sequence (0^1, 1^1, 2^1, ...). If you repeat this step, you get a polynomial whose coefficients are (0^2, 1^2, 2^2, ...) and if you do this operation N times, you can get a closed form of a polynomial whose coefficients are (0^N, 1^N, 2^N, ...).
The infinite sum converges for -1 < x < 1. If you set x=1/c, you get the infinite sum
0^N/c^0 + 1^N/c^1 + 2^N/c^2 + ...
which is exactly the sum we are trying to solve for. This means you solve any infinite sum of the form given by taking the derivative of 1/(1-x) N times while multiplying by x each time. Then plug in x=1/c at the end.
True, but the generating functions make it easier to prove that this works, rather than relying on the properties of a particular function (properties that you can most easily prove by reverting to the generating functions).