The blog says more about keeping the user data private. The remote models in the context are operating blind. I am not sure why you are nitpicking, almost nobody reading the blog would take remote code execution in that context.
The MCP aspect (for code/tool execution) is completely orthogonal to the issue of data privacy.
If you put a remote LLM in the chain than it is 100% going to inadvertently send user data up to them at some point.
e.g. if I attach a PDF to my context that contains private data, it WILL be sent to the LLM. I have no idea what "operating blind" means in this context. Connecting to a remote LLM means your outgoing requests are tied to a specific authenticated API key.
I am working on a chess analytics tool, specifically a free and open source replacement of Chessbase in this age of LLMs that can run on all platforms. The idea is to lower the barrier of entry to use a chess improvement tool since Chessbase can be intimidating for a causal Chess.com beginner looking to go into serious chess prep. At present, it can do basic queries like H2H score of Magnus Carlsen vs Hikaru Nakamura, the top 10 juniors in the US, Magnus Carlsen's games with the London system opening and involving a queen sacrifice etc. Though getting it to work for advanced multi-step tactical patterns and finding games with certain imbalances in the query using natural language is getting challenging. DuckDB has helped a lot, along with modern LLMs for query generation with schema and some preprocessing of game PGNs and piece hashes. It can also import a user's Chess.com and Lichess games given the usernames and do similar queries as on Master level games.
I also used the tool to generate an Adult Chess improvers FIDE rank list for all federations around the world. Here are the July 2025 rankings though it still needs major improvements in filtering - https://chess-ranking.pages.dev
------------------
Another idea that I have been working on for sometime is connecting my Gmail which is a source of truth for all financial, travel, personal related stuff to a LLM that can do isolated code execution to generate beautiful infographics, charts, etc. on my travels, spending patterns. The idea is to do local processing on my emails while generating the actual queries blindly using a powerful remote LLM by only providing a schema and an emails 'fingerprint' kind of file that gives the LLM a sense of what country, region, interests we might be talking about without actually transmitting personal data. The level of privacy of the 'fingerprint' vs the quality of queries generated is something I have been very confused with.
We looked at Pyodide and WASM along with other options like firecracker for our need of multi-step tasks that require running LLM generated code locally via Ollama etc. with some form of isolation than running it directly on our dev machines and figured it would be too much work with the various external libraries we have to install. The idea was to get code generated by a powerful remote LLM for general purpose stuff like video editing via ffmpeg, beautiful graphs generation via JS + chromium and stuff and execute it locally with all dependencies being installed before execution.
We built CodeRunner (https://github.com/BandarLabs/coderunner) on top of Apple Containers recently and have been using it for sometime. This works fine but still needs some improvement to work across very arbitrary prompts.
For the Gemini-cli integration, is the only difference between code runner with Gemini-cli, and gemini-cli itself, is that you are just using Gemini-cli in a container?
No, Gemini-cli still is on your local machine, when it generates some code based on your prompt, with Coderunner, the code runs inside a container (which is inside a new lightweight VM courtesy Apple and provides VM level isolation), installs libraries requested, executes the generated code inside it and returns the result back to Gemini-cli.
This is also not Gemini-cli specific and you could use the sandbox with any of the popular LLMs or even with your local ones.
Interestingly, this test has been in the public domain for the last seven years, since it is part of all possible chess games with 7 or less pieces, which is solved and published. It is a huge file, but the five pieces games dataset with the FEN is less than a GB. I wonder if it even got included in the training data earlier, or if it will be.
I don't think such datasets are going into AI training. But if this exact question keeps showing up in analytics data, and forum posts, it might end up in training sets.
It might be a reasonable ask for an LLM to 'remember' the endgame tablebase of solved games - which is less than a GB for all game with five or less pieces on the board. This puzzle specifically relies on this knowledge and the knowledge of how the chess pieces move.
I too started learning chess at the beginning of this pandemic when I turned 29. While I knew how the pieces moved as a kid, I was unaware of rules such as promotion, pawns moving two squares in the beginning, en passant etc. When I started I was 900 blitz on chess.com. I moved to ~1750 on Chess.com blitz and ~2000 on lichess blitz. I assume I would be higher in rapid if I played it as much, probably due to less competition in it online. I learned a single opening with white (and probably the most hated - London system), and one with black (Sicilian hyperaccelerated dragon). I guess most of my improvement came from observing tactics in games and in puzzles. Watching a lot of agadmator kind of videos also helped in figuring out what is a better move out of multiple candidate moves (I guess this is what is positional chess is about.) I have reduced playing it these days since it is quite addictive and takes up a lot of my free time. Also, it is quite demotivating to hear that no matter how much effort I put in as an adult, a 5 yo kid will be much better than me with the same amount of effort.
I have a similar story to you, learning those same systems as well! I don’t really buy the guy’s story I’m pretty sure the kid would eventually beat me regularly with enough years, but at age 5? I have access to more resources and more motivation, more focused training, less likely to make blunders.
We created a static version of this (almost similar to shodan but for keys) using publicly accessible Github dump hosted on Google Cloud in 2017. We then hosted the processed data, website and our search infra on AWS. AWS security team reached out to us for a potential “collaboration” and asked us to send all AWS keys that we discovered and we sent them the whole list. As a tiny startup, we were elated. Few days later they call us and threaten with a cease and desist notice if we do not take down the website. Remember we are not targeting AWS keys, neither are we in violation of any licensing agreements with respect to the data. We refused to shut it down. They then ask us to stop hosting it on AWS or “anywhere” else since we were using AWS credits to host the product or they will shut our account. When their this strategy did not work out, they contacted someone at Stripe who had given us the AWS credits, who then asked us to take it down or face consequences. We eventually had to shut it down since we did not have a lot of money to fight these people.
It was a stressful week for us where we learnt that corporates can lie and bully you to get whatever they want and then can shut you down. Unless you have the means to fight back. Does not matter where you live.
It included high entropy strings including keys from 30+ API and service providers, one of which was AWS. We did not target AWS specifically. None of the other services complained. In fact, a customer service widget company even took our help and thanked us. AWS tricked us in taking our findings and then changed their tone.