I also do OR-adjacent work, but I've had much less luck using 4o for formulating...

mjburgess · on Sept 15, 2024

Yip. We need research into how long it takes experts to repair faulty answers, vs. generate them on their own.

Benchmarking 10,000 attempts on an IQ test is irrelevant if on most of those attempts the time taken to repair an answer is long than the time to complete the test yourself.

I find its useful to generate examplars in areas you're roughly familiar with, but want to see some elaboration or a refresher. You can stich it all together to get further, but when it comes time to actually build/etc. something -- you need to start from scratch.

The time taken to reporduce what it's provided, now that you understand it, is trivial compared to the time needed to repair its flaws.