More

versteegen · 2026-02-28T00:28:50 1772238530

Yes, .wow also.

versteegen · 2026-02-27T11:38:14 1772192294

I recently used Makie to create an interactive tool for inspecting nodes of a search graph (dragging, hiding, expanding edges, custom graph layout), with floating windows of data and buttons. Yes, it's great for interactive plots (you can keep using the REPL to manipulate the plot, no freezing), yes Observables and GridLayout are great, and I was very impressed with Makie's plotting abilities from making the basics easy to the extremely advanced, but no, it was the wrong tool. Makie doesn't really do floating windows (subplots), and I had to jump through hoops to create my own float system which uses GridLayout for the GUI widgets inside them. I did get it to all work nearly flawlessly in the end, but I should probably have used a Julia imGUI wrapper instead: near instant start time!

leephillips · 2026-02-27T14:44:16 1772203456

Are you referring to https://github.com/JuliaImGui/CImGui.jl ?

versteegen · 2026-02-27T23:47:48 1772236068

Yes. And I did port my GUI layer to CimGui.jl. The rest of it is pretty intertwined with Makie, didn't do that yet. The Makie version does look better than ImGui though.

versteegen · 2026-02-27T08:09:53 1772179793

> They pragmatically changed their views of safety just recently, so those values for which they would burn at the stake are very fluid.

Yes it was a pragmatic change, no it was not a change in their values. The commentary here on HN about Anthropic's RSP change was completely off the mark. They "think these changes are the right thing for reducing AI risk, both from Anthropic and from other companies if they make similar changes", as stated in this detailed discussion by Holden Karnofsky, who takes "significant responsibility for this change":

https://www.lesswrong.com/posts/HzKuzrKfaDJvQqmjh/responsibl...

> I strongly think today’s environment does not fit the “prisoner’s dilemma” model. In today’s environment, I think there are companies not terribly far behind the frontier that would see any unilateral pause or slowdown as an opportunity rather than a warning.

> What I didn’t expect was that RSPs (at least in Anthropic’s case) would come to be seen as hard unilateral commitments (“escape clauses” notwithstanding) that would be very difficult to iterate on.

MichaelDickens · 2026-02-27T18:41:31 1772217691

> Yes it was a pragmatic change, no it was not a change in their values. The commentary here on HN about Anthropic's RSP change was completely off the mark. They "think these changes are the right thing for reducing AI risk, both from Anthropic and from other companies if they make similar changes", as stated in this detailed discussion by Holden Karnofsky, who takes "significant responsibility for this change":

Can you imagine a world where Anthropic says "we are changing our RSP; we think this increases AI risk, but we want to make more money"?

The fact that they claim the new RSP reduces risk gives us approximately zero evidence that the new RSP reduces risk.

ozozozd · 2026-02-28T03:18:01 1772248681

Well, the original claim of risk was also evidence-free.

It’s fair because the folks who are making the claim never left the armchair.

versteegen · 2026-02-27T23:51:32 1772236292

That misses my point: the evidence is the extensive argumentation provided for why it reduces risk. To quote Karnofsky:

> I wish people simply evaluated whether the changes seem good on the merits, without starting from a strong presumption that the mere fact of changes is either a bad thing or a fine thing. It should be hard to change good policies for bad reasons, not hard to change all policies for any reason.

versteegen · 2026-02-18T15:30:52 1771428652

I'd agree that this effect is probably mainly due to architectural parameters such as the number and dimensions of heads, and hidden dimension. But not so much the model size (number of parameters) or less training.

Saw something about Sonnet 4.6 having had a greatly increased amount of RL training over 4.5.

versteegen · 2026-02-18T15:13:09 1771427589

Agreed, and here's a real example from a tiny startup: Clickup's web app is too damn slow and bloated with features and UI, so we created emacs modes to access and edit Clickup workspaces (lists, kanban boards, docs, etc) via the API. Just some limited parts we care about. I was initially skeptical that it would work well or at all, but wow, it really has significantly improved the usefulness of Clickup by removing barriers.

m_ke · 2026-02-18T15:18:19 1771427899

you should try some markdown files in git

ethbr1 · 2026-02-19T12:05:45 1771502745

I think you completely missed the point of this thread: rolling ones own data model is a non-negligible cost.

m_ke · 2026-02-19T14:03:37 1771509817

you missed the boat on natural language being the new interface

best orgs will own their data and have full history in version control so that it's easier for LLMs and humans to work with, not walled garden traps

versteegen · 2026-02-19T14:54:50 1771512890

Sure, depending on the particular product, having control and direct local access to the data would be desirable or deal breaker. But for this Clickup integration that's not so important to us (we can duplicate where necessary), while still using Clickup lets us use all the other features we can get via the web app.

(The emacs mode includes an MCP server)

ethbr1 · 2026-02-19T21:12:38 1771535558

Natural language and LLMs as a core corporate data repository is insane.

versteegen · 2026-02-08T10:30:21 1770546621

> The only valid ARC AGI results are from tests done by the ARC AGI non-profit using an unreleased private set. I believe lab-conducted ARC AGI tests must be on public sets and taken on a 'scout's honor' basis that the lab self-administered the test correctly

Not very accurate. For each of ARC-AGI-1 and ARC-AGI-2 there is training set and three eval sets: public, semi-private, and private. The ARC foundation runs frontier LLMs on the semi-private set, and the labs give them pre-release API access so they can report release-day evals. They mostly don't allow anyone else to access the semi-private set (except for live Kaggle leaderboards which use it), so you see independent researchers report on the public eval set instead, often very dubious. The private is for Kaggle competitions only, no frontier LLMs evals are possible.

(ARC-AGI-1 results are now largely useless because most of its eval tasks became the ARC-2 training set. However some labs have said they don't train LLMs on the training sets anyway.)

versteegen · 2026-01-18T01:13:49 1768698829

Paid/API LLM inference is profitable, though. For example, DeepSeek R1 had "a cost profit margin of 545%" [1] (ignoring free users and using a placeholder $2/hour figure H800 GPU, which seems ballpark of real to me due to Chinese electricity subsidies). Dario has said each Anthropic model is profitable over its lifetime. (And looking at ccusage stats and thinking Anthropic is losing thousands per Claude Code user is nonsense, API prices aren't their real costs. That's why opencode gives free access to GLM 4.7 and other models: it was far cheaper than they expected due to the excellent cache hit rates.) If anyone ran out of money they would stop spending on experiments/research and training runs and be profitable... until their models were obsolete. But it's impossible for everyone to go bankrupt.

[1] https://github.com/deepseek-ai/open-infra-index/blob/main/20...

ares623 · 2026-01-18T01:35:00 1768700100

I don’t think the current industry can survive without both frontier training and inference.

Getting rid of frontier training will mean open source models will very quickly catch up. The great houses of AI need to continue training or die.

In any case, best of luck (not) to the first house to do so!

ben_w · 2026-01-18T12:14:52 1768738492

That's more of "cloud compute makes money" than "AI makes money".

If the models stop being updated, consumer hardware catches up and we can all just run them locally in about 5 years (for PCs, 7-10 for phones), at which point who bothers paying for a hosted model?

versteegen · 2026-01-16T11:00:27 1768561227

It's excellent that you're working on loneliness! Somehow. What is it your startup actually does?

versteegen · 2026-01-07T04:56:05 1767761765

Yes, unfortunate that people keep perpetuating that misquote. What he actually said was "we are not far from the world—I think we’ll be there in three to six months—where AI is writing 90 percent of the code."

https://www.cfr.org/event/ceo-speaker-series-dario-amodei-an...

versteegen · 2026-01-02T12:51:14 1767358274

That's pretty good, I'm jealous! The last time I reinstalled my OS (Slackware) from scratch was 2009, but I run into serious problems every couple of years when upgrading it to 'Slackware64-current' pre-release, because Slackware's package manager doesn't track dependencies and you can just install stuff in the wrong order: I usually don't upgrade the whole OS at once... just have to fix any .so link errors (I've got a script to pull old libraries from btrfs snapshots). I've even ended up without a working libc more than once! When you can't run any program it sure is useful that you can upgrade everything aside from the kernel without rebooting!