I'm very skeptical of using AI in this way. I've given Claude access to calendars and travel plans and asked it to do similar analytical tasks cross referencing documents that would take days for me to do manually. Since it was about my own plans and life that I knew well, it was possible for me to spot subtle errors that seemed correct at the surface level but actually weren't the conclusions I would make. I've attempted these types of tasks 10-20 times with similar experiences each time. In the end, it's made me very skeptical, like your wife. I don't trust any AI output without a thorough review. Hallucinations are still a frequent problem.
.. and I think there are already evidence that it tends to affect people who had regular lymphatic inflammations throughout their life (on a less serious note: like yours truly's.. the neck/throat ones.. and I am already forgetting things and blanking out and I haven't even touched 40 :/).
This has been my experience with almost everything I've tried to create with generative AI, from apps and websites, to photos and videos, to text and even simple sentences. At first glance, it looks impressive, but as soon as you look closer, you start to notice that everything is actually just sloppy copy.
That being said, sloppy copy can make doing actual work a lot faster if you treat it with the right about of skepticism and hand-holding.
It's first attempt at the Space Jam site was close enough that it probably could have been manually fixed by an experienced developer in less time than in takes to write the next prompt.
but my experience has also been that with every model they require less hand holding and the code is less sloppy. If I’m careful with my prompts, gpt codex 5.1 recently has been making a lot of typescript for me that is basically production ready in a way it couldn’t even 2 months ago
In layman's terms, this seems to mean that given a certain unedited LLM output, plus complete information about the LLM, they can determine what prompt was used to create the output. Except that in practice this works almost never. Am I understanding correctly?
No, it's about the distribution being injective, not a single sampled response. So you need a lot of outputs of the same prompt, and know the LLM, and then you should in theory be able to reconstruct the original prompt.
reply