Last week OpenAI released HealthBench, the most comprehensive set of evals for health to date. The top 3 scoring models all spiked on different things:
- GPT-4.1 is best when you need a straight answer
- o3 is best for complex cases
- Grok is best at clarifying important info (“truthseeking”)
Made this prototype mostly to understand HealthBench deeper. I will probably use it in the future products I make.
Interesting, do you feel like your / your founder’s qualities and relationship had any impact on the success of failure of your startup? Would love to hear how things went.
I sent it to OpenAI because I felt comfortable doing it with my entries, but my guess is that the same can be done with an open model.
I want to see if it’s possible to have it extract more structured information about my beliefs, values, and thought patterns, and then reference it to non-intrusively comment on my writing.
Let me know if you’re interested in this, I saw your post on journaling and found it thoughtful.
> I saw your post on journaling and found it thoughtful
Thanks!
> Let me know if you’re interested in this
That'd be really great, I'd love to do something like this with my journal. My email is in my profile, or just drop me a message on LinkedIn or something. Whatever's easiest.
- GPT-4.1 is best when you need a straight answer - o3 is best for complex cases - Grok is best at clarifying important info (“truthseeking”)
Made this prototype mostly to understand HealthBench deeper. I will probably use it in the future products I make.