Hacker Newsnew | past | comments | ask | show | jobs | submit | shipilovya's commentslogin

Last week OpenAI released HealthBench, the most comprehensive set of evals for health to date. The top 3 scoring models all spiked on different things:

- GPT-4.1 is best when you need a straight answer - o3 is best for complex cases - Grok is best at clarifying important info (“truthseeking”)

Made this prototype mostly to understand HealthBench deeper. I will probably use it in the future products I make.


These 41 samples came from 24 unique products


Interesting, do you feel like your / your founder’s qualities and relationship had any impact on the success of failure of your startup? Would love to hear how things went.


Yep! I discovered it unexpectedly and found it works really well, at least so far. In hindsight it's obvious it would have worked in my case.


I sent it to OpenAI because I felt comfortable doing it with my entries, but my guess is that the same can be done with an open model.

I want to see if it’s possible to have it extract more structured information about my beliefs, values, and thought patterns, and then reference it to non-intrusively comment on my writing.

Let me know if you’re interested in this, I saw your post on journaling and found it thoughtful.


> I saw your post on journaling and found it thoughtful

Thanks!

> Let me know if you’re interested in this

That'd be really great, I'd love to do something like this with my journal. My email is in my profile, or just drop me a message on LinkedIn or something. Whatever's easiest.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: