Although I rarely hit my limit in my $20 a month Codex plan, I can imagine this would be very useful.
The issue I have more often is that I will start a conversation in ChatGPT, and realize an hour later that I needed all that context to be in Codex, so I’ll generally ask ChatGPT to give me a summary of all facts and a copy‑paste prompt for Codex. But maybe there is a way to extract the more useful content from a chat UI to an agent UI.
imo an agent learns way more by watching the raw agentic flow than by reading some sanitized context dump. you get to see exactly where the last bot derailed and then patched itself. give that a shot—handing over a spotless doc feels fake anyway.
Screen sharing to any remote API is a nonstarter for me. I don’t care if the API claims ZDR; Snowden’s revelations are still echoing. So, I appreciate that the app supports a custom endpoint for local models.
I've got it installed with Qwen3-VL-4B running in LM Studio on my MBP M1 Pro. (Yes, the fans are running.) GLM-OCR didn't work because it returns all text on the screen, despite the instructions asking only for a summary.
Screenshots are summarized in ~28 seconds. Here's the last one:
> "The user switched to the Hacker News tab, displaying item 47049307 with a “Gave Claude photographic memory for $0.0002/screenshot” headline. The chat now shows “Sonnet 4.6” and a message asking “What have I been doing in the past 10 minutes?” profile, replacing prior Signal content. The satellite map background remains unchanged."
The satellite map background remains unchanged message appears in every summary (my desktop background is a random Google Maps satellite image that rotates every hour).
I would like to experiment with custom model instructions – for example, to ignore desktop background images.
Earlier in my testing it was sending screenshots for both of my displays at the same time, which was much slower, but now it's only sending screenshots of my main screen. Does MemoryLane only send screenshots for displays that have active windows?
Update: I switched to Qwen3 VL 2B (`qwen3-vl-2b-instruct-mlx@bf16`) which is 2.5× faster than 4B (11s vs 18s per screenshot) and my meager M1 Pro is able to keep up without the fans spinning 100% of the time.
Did you scroll through the pricing options? The largest Kimi plan is $199/month. “Much better” depends on how much usage is included vs. Anthropic plans/API costs.
From what I can see, this is agentic tooling that provides similar features to OpenClaw. It’s been on GitHub since June 2024 but never seemed to catch the hype train. Some stats comparing the popularity of the two:
Agent Zero: 14k GH stars, 3k X followers
OpenClaw: 197k GH stars, 314k X followers
Intelligence might be more like an optimization problem, fitting inputs to optimal outputs. Sometimes reality is simply too chaotic to model precisely so there is a limit to how good that optimization can be.
It would be like distance to the top of a mountain. Even if someone is 10x closer, they could still only be within arms reach.
The issue I have more often is that I will start a conversation in ChatGPT, and realize an hour later that I needed all that context to be in Codex, so I’ll generally ask ChatGPT to give me a summary of all facts and a copy‑paste prompt for Codex. But maybe there is a way to extract the more useful content from a chat UI to an agent UI.
reply