More

jstummbillig · 2026-02-03T18:31:37 1770143497

There is: It works (even if we can't explain why right now).

If we knew how to create a SOTA coding model by just putting coding stuff in there, that is how we would build SOTA coding models.

jstummbillig · 2026-02-03T17:58:23 1770141503

> I’m not paying to support anticompetitive behaviour

You are doing that all the time. You just draw the line, arbitrarily.

tclancy · 2026-02-03T18:02:58 1770141778

The enemy of done is perfect, etc. what is the point of comments like this?

jstummbillig · 2026-02-03T18:26:21 1770143181

What is the point of any of this? To exchange how we think about things. I think virtue signaling is boring and uncandid.

InsideOutSanta · 2026-02-03T18:50:52 1770144652

But you are virtue-signalling, too, based on your own definition of virtuous behavior. In fact, you're doing nothing else. You're not contributing anything of value to the discussion.

tclancy · 2026-02-03T18:34:25 1770143665

Unclench and stop seeing everything as virtual signaling. What about al those White Knight, SJWs in the 70s who were against leaded gas? Still virtue signaling?

mannanj · 2026-02-03T18:03:50 1770141830

That's great, yes. We all draw the line somewhere, subjectively. We all pretend we follow logic and reason and lets all be more honest and truthfully share how we as humans are emotionally driven not logically driven.

It's like this old adage "Our brains are poor masters and great slaves". We are basically just wanting to survive and we've trained ourselves to follow the orders of our old corporate slave masters who are now failing us, and we are unfortunately out of fear paying and supporting anticompetitive behavior and our internal dissonance is stopping us from changing it (along with fear of survival and missing out and so forth).

The global marketing by the slave master class isn't helping. We can draw a line however arbitrary we'd like though and its still better and more helpful than complaining "you drew a line arbitrarily" and not actually doing any of the hard courageous work of drawing lines of any kind in the first place.

jstummbillig · 2026-02-03T08:53:19 1770108799

I don't understand the claim. SpaceX is literally delivering? And I don't think there is any delusion about that being optional.

theshrike79 · 2026-02-03T11:45:28 1770119128

SpaceX is delivering despite Elmo's involvement.

The story is that they have a person (or people?) who are REALLY good at managing him and shoving him through the SpaceX offices so that he things he's contributing and out the back door before he has time to fuck anything up.

blockmarker · 2026-02-03T14:28:22 1770128902

Yeah yeah, the person you dislike is stupid and the success of his multiple companies is just luck and everybody else does the work.

theshrike79 · 2026-02-03T15:26:21 1770132381

I’m not the source of this information: https://news.ycombinator.com/item?id=34012719

UltraSane · 2026-02-03T15:39:37 1770133177

The product Elon has been most directly involved in is the Cybertruck which is a complete disaster. When talking about Elon you have to specify pre drug addict Elon and ketamine fried brain Elon. The latter makes very bad decisions.

davidguetta · 2026-02-03T12:30:20 1770121820

source: trust me bro

theshrike79 · 2026-02-03T15:25:44 1770132344

Source: this HN comment from 2022: https://news.ycombinator.com/item?id=34012719

tomhow · 2026-02-03T14:51:27 1770130287

Please stop posting these throwaway, sneering replies, no matter how bad the comment you're replying to. Just downvote it, and if you must comment, do so substantively.

https://news.ycombinator.com/newsguidelines.html

adammarples · 2026-02-03T09:56:19 1770112579

The wild claim is that they will deliver data centres in space

mortarion · 2026-02-03T10:06:14 1770113174

Yeah, delivering using Falcon 9.

The Starship stack? Not so much. It's plagued, and will continue to be plagued, by endless problems. BO will beat them with NG.

jstummbillig · 2026-02-02T18:34:02 1770057242

Everyone has the "fear" of being near other people, regardless of their affluence. That's why apartments are not built for 20 but got 2-5 people and doors exist. I don't see why it must be a rich people thing when it comes to self driving cars. Could also become super interesting by making remoter areas more serviceable.

jstummbillig · 2026-01-30T18:50:37 1769799037

> I think most people are buying separate computers to run it on.

You must be joking.

pluralmonad · 2026-01-30T19:02:50 1769799770

I have a separate removable SSD I can boot from to work with Claude in a dedicated environment. It is nice being able to offload environment set up and what not to the agent. That environment has wifi credentials for an isolated LAN. I am much more permissive of Claude on that system. I even automatically allow it WebSearch, but not WebFetch (much larger injection surface). It still cannot do anything requiring sudo.

sph · 2026-01-30T19:20:11 1769800811

Man, let me tell you about virtual machines, it’s gonna blow your mind.

pluralmonad · 2026-01-30T19:59:10 1769803150

Call me old fashioned but I like my tangible approach.

jamwil · 2026-01-31T02:18:07 1769825887

You also get to run both systems on bare metal. Nothing wrong with this.

BryantD · 2026-01-30T19:04:07 1769799847

They are not. Many people are doing this; I don't think there's enough data to say "most," but there's at least anecdotal discussions of people buying Mac minis for the purpose. I know someone who's running it on a spare Mac mini (but it has Internet access and some credentials, so...).

phil21 · 2026-01-30T19:18:55 1769800735

Most tech enthusiasts I know have a myriad of computers laying around.

Spinning up a physical instance to try out some totally shady software is pretty standard stuff going back decades now.

jstummbillig · 2026-01-29T23:00:51 1769727651

Why could you not have a combination of both?

verdverm · 2026-01-29T23:35:31 1769729731

You can and should, it works better than either alone

jstummbillig · 2026-01-29T22:59:01 1769727541

> Obviously directly including context in something like a system prompt will put it in context 100% of the time.

How do you suppose skills get announced to the model? It's all in the context in some way. The interesting part here is: Just (relatively naively) compressing stuff in the AGENTS.md seems to work better than however skills are implemented.

cortesoft · 2026-01-29T23:14:44 1769728484

Isn't the difference that a skill means you just have to add the script name and explanation to the context instead of the entire script plus the explanation?

majormajor · 2026-01-30T04:54:59 1769748899

Their non-skill based "compressed index" is just similarly "Each line maps a directory path to the doc files it contains" but without "skillification." They didn't load all those things into context directly, just pointers.

They also didn't bother with any more "explanation" beyond "here are paths for docs."

But this straightforward "here are paths for docs" produced better results, and IMO it makes sense since the more extra abstractions you add, the more chance of a given prompt + situational context not connecting with your desired skill.

sevg · 2026-01-29T23:22:57 1769728977

You could put the name and explanation in CLAUDE.md/AGENTS.md, plus the path to the rest of the skill that Claude can read if needed.

That seems roughly equivalent to my unenlightened mind!

verdverm · 2026-01-29T23:33:31 1769729611

I like to think about it this way, you want to put some high level, table of contents, sparknotes like stuff in the system prompt. This helps warm up the right pathways. In this, you also need to inform that there are more things it may need, depending on "context", through filesystem traversal or search tools, the difference is unimportant, other than most things outside of coding typically don't do filesystem things the same way

imiric · 2026-01-30T02:06:47 1769738807

The amount of discussion and "novel" text formats that accomplish the same thing since 2022 is insane. Nobody knows how to extract the most value out of this tech, yet everyone talks like they do. If these aren't signs of a bubble, I don't know what is.

stevenhuang · 2026-01-30T03:50:21 1769745021

It's a new technology under active development so people are simply sharing what works for them in the given moment.

> If these aren't signs of a bubble, I don't know what is.

This conclusion is incoherent and doesn't follow from any of your premises.

imiric · 2026-01-30T09:05:53 1769763953

Sure it does. Many people are jumping on ideas and workflows proposed by influencer personalities and companies, without actually evaluating how valid or useful they actually are. TFA makes this clear by saying that they were "betting on skills" and only later determined that they get better performance from a different workflow.

This is very similar to speculative valuations around the web in the late 90s, except this bubble is far larger, more mainstream and personal.

The fact that this is a debate about which Markdown file to put prompt information in is wild. It ultimately all boils down to feeding context to the model, which hasn't fundamentally changed since 2022.

verdverm · 2026-01-30T15:34:37 1769787277

1. There is nothing novel in my text formats, I'm just deciding what content and what files

2. I've actually done these things, seen the difference, and share it with others

Yes there are a lot of unknowns and a lot of people speaking from ignorance, but it is a mistake, perhaps even bigotry by definition, to make such blanket statements and judgemental about people

jmathai · 2026-01-30T04:19:20 1769746760

Skills have frontmatter which includes a name and description. The description is what determines if the llm finds the skill useful for the task at hand.

If your agent isn’t being used, it’s not as simple as “agents aren’t getting called”. You have to figure out how to get the agent invoked.

Spivak · 2026-01-30T04:50:37 1769748637

Sure, but then you're playing a very annoying and boring game of model-whispering to specific versions of models that are ever changing as well as trying to hopefully get it to respond correctly with who knows what user input surrounds it.

I really only think the game is worth playing when it's against a fixed version of a specific model. The amount of variance we observe between different releases of the same model is enough to require us to update our prompts and re-test. I don't envy anyone who has to try and find some median text that performs okay on every model.

bonesss · 2026-01-30T08:29:47 1769761787

About a year ago I made an ChatGPT and Claude based hobo RAG-alike solution for exploring legal cases, using document creation and LLMs to craft a rich context window for interrogation in the chat.

Just maintaining a basic interaction framework, consistent behaviours in chat when starting up, was a daily whack-a-mole where well-tested behaviours shift and alter without rhyme or reason. “Model whispering” is right. Subjectively it felt like I could feel Anthropic/OpenAI engineers twiddling dials on the other side.

Writing code that executes the same every time has some minor benefits.

jstummbillig · 2026-01-29T18:44:27 1769712267

Then don't.

jstummbillig · 2026-01-29T13:50:40 1769694640

"The most important section is that of 2-3 year old vehicles, because maintenance and mileage play lesser roles in reliability. The best performers in this category were the Mazda2 (2.9% defect rate)"

Once again, my intuition is wildly off regarding how bad even the relatively good things are. 3% defect rate is good?

Tesla seems insane. How do you get away with being so much worse for so many years in a highly competitive market?

jstummbillig · 2026-01-27T21:07:56 1769548076

But it is, capability adjusted, which is the only way it makes sense. You can definitely produce last years capability at a huge discount.