But you are virtue-signalling, too, based on your own definition of virtuous behavior. In fact, you're doing nothing else. You're not contributing anything of value to the discussion.
Unclench and stop seeing everything as virtual signaling. What about al those White Knight, SJWs in the 70s who were against leaded gas? Still virtue signaling?
That's great, yes. We all draw the line somewhere, subjectively. We all pretend we follow logic and reason and lets all be more honest and truthfully share how we as humans are emotionally driven not logically driven.
It's like this old adage "Our brains are poor masters and great slaves". We are basically just wanting to survive and we've trained ourselves to follow the orders of our old corporate slave masters who are now failing us, and we are unfortunately out of fear paying and supporting anticompetitive behavior and our internal dissonance is stopping us from changing it (along with fear of survival and missing out and so forth).
The global marketing by the slave master class isn't helping. We can draw a line however arbitrary we'd like though and its still better and more helpful than complaining "you drew a line arbitrarily" and not actually doing any of the hard courageous work of drawing lines of any kind in the first place.
The story is that they have a person (or people?) who are REALLY good at managing him and shoving him through the SpaceX offices so that he things he's contributing and out the back door before he has time to fuck anything up.
The product Elon has been most directly involved in is the Cybertruck which is a complete disaster. When talking about Elon you have to specify pre drug addict Elon and ketamine fried brain Elon. The latter makes very bad decisions.
Please stop posting these throwaway, sneering replies, no matter how bad the comment you're replying to. Just downvote it, and if you must comment, do so substantively.
Everyone has the "fear" of being near other people, regardless of their affluence. That's why apartments are not built for 20 but got 2-5 people and doors exist. I don't see why it must be a rich people thing when it comes to self driving cars. Could also become super interesting by making remoter areas more serviceable.
I have a separate removable SSD I can boot from to work with Claude in a dedicated environment. It is nice being able to offload environment set up and what not to the agent. That environment has wifi credentials for an isolated LAN. I am much more permissive of Claude on that system. I even automatically allow it WebSearch, but not WebFetch (much larger injection surface). It still cannot do anything requiring sudo.
They are not. Many people are doing this; I don't think there's enough data to say "most," but there's at least anecdotal discussions of people buying Mac minis for the purpose. I know someone who's running it on a spare Mac mini (but it has Internet access and some credentials, so...).
> Obviously directly including context in something like a system prompt will put it in context 100% of the time.
How do you suppose skills get announced to the model? It's all in the context in some way. The interesting part here is: Just (relatively naively) compressing stuff in the AGENTS.md seems to work better than however skills are implemented.
Isn't the difference that a skill means you just have to add the script name and explanation to the context instead of the entire script plus the explanation?
Their non-skill based "compressed index" is just similarly "Each line maps a directory path to the doc files it contains" but without "skillification." They didn't load all those things into context directly, just pointers.
They also didn't bother with any more "explanation" beyond "here are paths for docs."
But this straightforward "here are paths for docs" produced better results, and IMO it makes sense since the more extra abstractions you add, the more chance of a given prompt + situational context not connecting with your desired skill.
I like to think about it this way, you want to put some high level, table of contents, sparknotes like stuff in the system prompt. This helps warm up the right pathways. In this, you also need to inform that there are more things it may need, depending on "context", through filesystem traversal or search tools, the difference is unimportant, other than most things outside of coding typically don't do filesystem things the same way
The amount of discussion and "novel" text formats that accomplish the same thing since 2022 is insane. Nobody knows how to extract the most value out of this tech, yet everyone talks like they do. If these aren't signs of a bubble, I don't know what is.
Sure it does. Many people are jumping on ideas and workflows proposed by influencer personalities and companies, without actually evaluating how valid or useful they actually are. TFA makes this clear by saying that they were "betting on skills" and only later determined that they get better performance from a different workflow.
This is very similar to speculative valuations around the web in the late 90s, except this bubble is far larger, more mainstream and personal.
The fact that this is a debate about which Markdown file to put prompt information in is wild. It ultimately all boils down to feeding context to the model, which hasn't fundamentally changed since 2022.
1. There is nothing novel in my text formats, I'm just deciding what content and what files
2. I've actually done these things, seen the difference, and share it with others
Yes there are a lot of unknowns and a lot of people speaking from ignorance, but it is a mistake, perhaps even bigotry by definition, to make such blanket statements and judgemental about people
Skills have frontmatter which includes a name and description. The description is what determines if the llm finds the skill useful for the task at hand.
If your agent isn’t being used, it’s not as simple as “agents aren’t getting called”. You have to figure out how to get the agent invoked.
Sure, but then you're playing a very annoying and boring game of model-whispering to specific versions of models that are ever changing as well as trying to hopefully get it to respond correctly with who knows what user input surrounds it.
I really only think the game is worth playing when it's against a fixed version of a specific model. The amount of variance we observe between different releases of the same model is enough to require us to update our prompts and re-test. I don't envy anyone who has to try and find some median text that performs okay on every model.
About a year ago I made an ChatGPT and Claude based hobo RAG-alike solution for exploring legal cases, using document creation and LLMs to craft a rich context window for interrogation in the chat.
Just maintaining a basic interaction framework, consistent behaviours in chat when starting up, was a daily whack-a-mole where well-tested behaviours shift and alter without rhyme or reason. “Model whispering” is right. Subjectively it felt like I could feel Anthropic/OpenAI engineers twiddling dials on the other side.
Writing code that executes the same every time has some minor benefits.
"The most important section is that of 2-3 year old vehicles, because maintenance and mileage play lesser roles in reliability. The best performers in this category were the Mazda2 (2.9% defect rate)"
Once again, my intuition is wildly off regarding how bad even the relatively good things are. 3% defect rate is good?
Tesla seems insane. How do you get away with being so much worse for so many years in a highly competitive market?
If we knew how to create a SOTA coding model by just putting coding stuff in there, that is how we would build SOTA coding models.
reply