> One thing that has been frustrating me is that people rarely share their workflows after making big claims
Good luck ever getting that. I've asked that about a dozen times on here from people making these claims and have never received a response. And I'm genuinely curious as well, so I will continue asking.
People share this stuff all the time. Kenton Varda published a whole walkthrough[1], prompts and all. Stories about people's personal LLM workflows have been on the front page here repeatedly over the last few months.
What people aren't doing is proving to you that their workflows work as well as they say they do. You want proof, you can DM people for their rate card and see what that costs.
Thanks for sharing and that is interesting to read through. But it's still just a demo, not live production code. From the readme:
> As of March, 2025, this library is very new, prerelease software.
I'm not looking for personal proof that their workflows work as well as they say they do.
I just want an example of a project in production with active users depending on the service for business functions that has been written 1.5/2/5/10/whatever x faster than it otherwise would have without AI.
Anyone can vibe code a side project with 10 users or a demo meant to generate hype/sales interest. But I want someone to actually have put their money where their mouth is and give an example of a project that would have legal, security, or monetary consequences if bad code was put in production. Because those are the types of projects that matter to me when trying to evaluate people's claims (since those are what my paycheck actually depends on).
That code tptacek linked you to? It's part of our (Cloudflare's) MCP framework. Which means all of the companies mentioned in this blog post are using this code in production today: https://blog.cloudflare.com/mcp-demo-day/
There you go. This is what you are looking for. Why are you refusing to believe it?
(OK fine. I guess I should probably update the readme to remove that "prerelease" line.)
See, I just shared Kenton Varda describing his entire workflow, and you came back asking that I please show you a workflow that would find more credible. Do you want to learn about people's workflows, or do you want to argue with them that their workflows don't work? Nobody is interested in doing the latter with you.
I don't think you understood me at all. I don't care about the actual workflow. I just want an example of of a project that:
1. Would have legal, security, or monetary consequences if bad code was put in production
2. Was developed using an AI/LLM/Agent/etc that made the development many times faster than it otherwise would have (as so many people claim)
I would love to hear an example where "I used Claude to develop this hosting/ecommerce/analytics/inventory management service that is used in production by 50 paying companies. Using an LLM we deployed the project in 4 week where it would normally take us 4 months." Or "We updated an out of date code base for a client in half the time it would normally take and have not seen any issues since launch"
At the end of the day I code to get paid. And it would really help to be able to point to actual cases where both money and negative consequences of failure are on the line.
So if you have any examples please share. But the more people deflect the more skeptical I get about their claims.
Seems like I understand you pretty well! If you wanted to talk about workflows in a curious and open way, your best bet would have been finishing that comment with something other than "the more people deflect the more skeptical I get". Stay skeptical! You do you.
Sorry if I came of as prickly, but it wasn't exactly like your parent comment was much kinder.
I mean it's pretty simple - there are a lot of big claims that I read but very few tangible examples that people share where the project has consequences for failure. Someone else replied with some helpful examples in another thread. If you want to add another one feel free, if not that's cool too.
It almost feels like sealioning. People say nobody shares their workflow, so I share it. They say well that's not production code, so I point to PRs in active projects I'm using, and they say well that doesn't demonstrate your interactive flow. I point out the design documents and prompts and they say yes but what kind of setup do you do, which MCP servers are you running, and I point them at my MCP repo.
At some point you have to accept that no amount of proof will convince someone that refuses to be swayed. It's very frustrating because, while these are wonderful tools already, its clear that the biggest thing that makes a positive difference is people using and improving them. They're still in relative infancy.
I want to have the kind of conversations we had back at the beginning of web development, when people were delighted at what was possible despite everything being relatively awful.
I don't care about your workflow, that can be figured out from the 10,000 blog posts all describing the same thing. My issue is with people claiming this huge boost in productivity only to find out that they are working on code bases that have no real consequence if something fails, breaks, or doesn't work as intended.
Since my day job is creating systems that need to be operational and predictable for paying clients - examples of front end mockups, demos, apps with no users, etc don't really matter that much at the end of the day. It's like the difference between being a great speaker in a group of 3 friends vs standing up in front of a 30 person audience with your job on the line.
If you have some examples, I'd love to hear about them because I am genuinely curious.
Sure, I'm working on a database proxy in rust at the moment, if you hop on GitHub, same username. It's not pure AI in the PRs but I know approximately no Rust, so AI support has been absolutely critical. I added support for parsing binary timestamps from PG's wire format, as an example.
I spent probably a day building prompts and tests and getting an example of failing behavior in Python, and then I wrote pseudocode and had it implement and write comprehensive unit tests in rust. About three passes and manual review of every line. I also have an MCP that calls out to O3 as a second opinion code review and passes it back in
I use agentic flows writing code that deals with millions of pieces of financial data every day.
I rolled out a PR that was a one shot change to our fundamental storage layer on our hot path yesterday. This was part of a large codebase and that file has existed for four years. It hadn’t been touched in 2. I literally didn’t touch a text editor on that change.
I have first hand experience watching devs do this with payment processing code that handles over a billion dollars on a given day.
I reviewed that PR in the GitHub web gui and in our CI/CD gui. It was one of several PRs that I was reviewing at the time, some by agents, some by people and some by a mix.
Because I was the instigator of that change a second code owner was required to approve the PR as well. That PR didn't require any changes, which is uncommon but not particularly rare.
It is _common_ for me to only give feedback to the agents via the GitHub gui the same way I do humans. Occasionally I have to pull the PR down locally and use the full powers of my dev environment to review but I don't think that is any more common than with people. If anything its less common because of the tasks the agents get typically they either do well or I kill the PR without much review.
Good luck ever getting that. I've asked that about a dozen times on here from people making these claims and have never received a response. And I'm genuinely curious as well, so I will continue asking.