More

snewman · on Nov 1, 2024

NDAs don't generally have an expiration date. (As opposed to non-competes and non-solicitation agreements, which generally do.) An NDA typically ends only if the information in question becomes public, and then only for that particular information.

snewman · on Oct 26, 2024

Note that very often, the solution process for a more difficult puzzle involves going through nontrivial logic to prove that a particular square must be water. It's important to be able to record non-obvious "must be water" squares.

snewman · on Sept 18, 2024

Often the connection between the load balancer and app backend also uses TLS. I've operated a large / complex service on AWS and all internal communications at each level were encrypted.

Of course, in principle, a cloud provider could tap in anywhere you're using their services – ELB (load balancer), S3, etc. I presume they could even provide backdoors into EC2 instances if they were willing to take the reputational risk. But even if you assume the NSA or whoever is able to tap into internal network links within a data center, that alone wouldn't necessarily accomplish much (depending on the target).

snewman · on July 26, 2024

Why do you say these are problems that are already solved? Sure, they're often variations on existing themes, but the same is true for chess positions and, honestly, almost everything else in any field of human endeavor.

Agreed that the absolute upper tier of chess players have trained longer and harder than most or all IMO contestants. Though I do wonder which (top-tier chess or the IMO) draws on a larger talent pool. To my understanding, a significant fraction of all high school students on Earth take some form of qualifying exam which can channel them into an IMO training program.

And as far as the being amenable to brute force (relative difficulty for humans vs. computers): it seems that chess was comparatively easier for computers, IMO problems are comparatively easier for humans, and the game of Go is somewhere in between.

okintheory · on July 26, 2024

These problems are literally already solved? Of course, the IMO problem designers make sure the problems have solutions before the use them. That's very different than math research, where it's not known in advance what the answer is, or even that there is good answer.

snewman · on July 26, 2024

I'm saying they weren't solved until the problem composer (created and) solved them. They're not, in general, problems for which solutions have been lying around. So "these are problems that are already solved" isn't introducing anything interesting or useful into the discussion. The post I was replying to was trying to draw a contrast with chess moves, presumably on the grounds that (after the opening) each position in a chess game is novel, but IMO problems are equally novel.

It's true that IMO problems are vetted as being solvable, but that still doesn't really shed any information on how the difficulty of an IMO problem compares to the difficulty of chess play.

snewman · on July 26, 2024

Contestants get 4.5 hours for each of the two days of competition. They have to solve three problems in that time, so on average you can spend 1.5 hours per problem (if you're aiming to finish all three).

That said, the gap from "can't do it at all" to "can do it in 60 hours" is probably quite a bit larger than the gap from 60 hours to 1.5 hours.

moffkalast · on July 26, 2024

Timing something that can be ran faster by throwing better hardware at it honestly feels conceptually irrelevant, as long as the complexity is actually tractable.

snewman · on July 26, 2024

When tackling IMO problems, the hard part is coming up with a good approach to the proof. Verifying your proof (and rejecting your false attempts) is much easier. You'll know which one to submit.

(Source: I am a two-time IMO silver medalist.)

lupire · on July 26, 2024

You are a human, not an AI. You know whether your idea seems related to the solution. The AI has thousands of ideas and doesn't know which are better. Graders shouldn't accept a thousand guesses grubbing for 1 point.

If verifying a good idea is easy, then the evidence shows that the AI didn't have good ideas for the other 2 problems.

badrunaway · on July 26, 2024

we are talking about lean proofs. Given a formal statement and a proof - the lean can verify whether it's correct or not. It's like generating computer programs to solve a problem - the problem lied in generating useful solutions/sub-solutions so that the search is effective. They achieve this via using gemini as a lean proof generator aka. using a world model LLM fine tuned to generate lean proofs in a more effective manner.

Humans are even better at this as you mention - but effectively the approach is similar. Come up with lot of ideas and see what proves it.

saagarjha · on July 26, 2024

I don't see why we should take your word for it, as opposed to just asking AlphaProof to comment instead.

topato · on July 26, 2024

Well, he does have twice the amount of silver medals... And can speak the English language... Although, an AI attempting to speak with the human race entirely through an esoteric math-proofing language would be an interesting take on the whole, "humans make ET contact, interact through universal language.... pause for half of the movie, military hothead wanting to blow it out of the sky, until pretty lady mathematician runs into the oval office waving a sheet of paper... OF MATH!" trope.... But now, it's a race between you and I, to see who can write the screenplay first!

snewman · on July 23, 2024

In many ways it doesn't cancel out. On Mars, things weigh less but their mass is the same. So, for example, if you are walking on Mars at a normal Earth walking speed, and you bump your shin on something, the inertia of your leg as it hits the obstacle is just as great as the inertia of your leg on Earth. In general, a lot of the little physical interactions that you rely on bone strength to get through scale according to mass / inertia (and muscle strength), not weight.

snewman · on July 8, 2024

> What Boeing really needs is a complete change in management culture, as that was the real root cause for the MAX disasters, but that is impossible to enforce, you can't even really verify that it has happened.

Serious question: why not? Why couldn't we require Boeing to change its culture, and then verify and enforce that change? Verification might involve periodically interviewing employees across all levels of the organization, performing spot checks and audits to make sure that procedures are being followed (no more failing to enter a work item into the system), and so forth.

(I don't mean to imply that this would be feasible under current regulatory law – I have no idea whether that is the case or not. I'm just saying that you could imagine a world where this could be done.)

snewman · on June 27, 2024

Presumably Anthropic / Claude, OpenAI / GPT, Google DeepMind / Gemini.

lowyek · on June 27, 2024

thank you. yes I meant that.

snewman · on May 31, 2024

This seems clear and authoritative. The one thing I can't wrap my head around: why didn't he say this when the story of Sam getting fired first circulated? Why didn't Sam ask him to clarify the record sooner? It would have been easy to tweet this out at any time.

Edit to add: obviously pg isn't going to respond to every rumor, but this one has had significant attention and he was particularly well placed to address it.