I've been using 5.1-codex-max with low reasoning (in Cursor fwiw) recently and it feels like a nice speed while still being effective. Might be worth a shot.
We spend ~$20,000 per month in AWS for the product I work on. In the average day we do not launch an EC2 instance. We do not do any dynamic scaling. However, there are many scenarios (especially during outages and such) that it would be critical for us to be able to launch a new instance (and or stop/start an existing instance.)
I run with the dangerous option on my work computer. At first I was thinking I would be good if I just regularly kept full disk backups. But my company at least pays lip service to the fact that we want to protect our intellectual property. Plus I think it might be irresponsible to allow an AI model full internet access unsupervised.
So now I use a docker compose setup where I install Claude and run it in a container. I map source code volumes into the container. It uses a different container with dnsmasq with an allowlist.
I initially wanted to do HTTP proxying instead of DNS filtering since it would be more secure, but it was quite hard to set it up satisfactorily.
Running CLI programs with the dangerous full permissions is a lot more comfortable and fast, so I'm quite satisfied.
It's pretty easy if you just use the MCP Python library. You just put an annotation on a function and there's your tool. I was able to do it and it works great without me knowing anything about MCP. Maybe it's a different story if you actually need to know the protocol and implement more for yourself
Yes, I am using their Python SDK. But you can't just add MCP to your existing API server if it's not ready to async Python. Probably, you would need to deploy it as a separate server and make server-to-server to your API. Making authentication work with your corporate IAM provider is a path of trial and error — not all MCP hosts implement it the same way so you need to compare behaviours of multiple apps to decide if it's your setup that fails or bugs in VS Code or something like that. I haven't even started to think about the ability of a server to message back to the client to communicate with LLM, AFAIK modern clients don't support such a scenario yet, at least don't support it well.
So yes, adding a tool is trivial, adding an MCP server to your existing application might require some non-trivial work of probably unnecessary complexity.
Hi, I appreciate you sharing. I've been starting to use this advice with a different tool. Just FYI, this sentence kind of came out of nowhere and it wasn't clear what you meant:
> The foundational LLM models right now are what I'd estimate to be at circa 45% accuracy and require frequent steering
Do your rules count as frequent steering and lead to increased 'accuracy', or is that the 'accuracy' you're seeing with your current workflow, rules and all?