Hacker Newsnew | past | comments | ask | show | jobs | submit | bluejay2387's commentslogin

Want to second this. Asking the model to create a work of fiction and it complying isn't a pathology. Mozart wasn't "hallucinating" when he created "The Marriage of Figaro".

But many artists are hallucinating when they envisioned some of their pieces. Who's to say Mozart wasn't on a trip when he created The Marriage of Figaro.


That would have to be a very very long hallucination because it’s a huge opera that took a long time to write.

We don't know Mozart's state of mind when he composed.

He didn't hallucinate the Marriage of Figaro but he may well have been hallucinating.


THIS is probably the moment that the AI naysayers on this board wake up to the potential of current AI...

2x 3090's running Ollama and VLLM... Ollama for most stuff and VLLM for the few models that I need to test that don't run on Ollama. Open Web UI as my primary interface. I just moved to Devstral for coding using the Continue plugin in VSCode. I use Qwen 3 32b for creative stuff and Flux Dev for images. Gemma 3 27b for most everything else (slightly less smart than Qwen, but its faster). Mixed Bread for embeddings (though apparently NV-Embed-v2 is better?). Pydantic as my main utility library. This is all for personal stuff. My stack at work is completely different and driven more by our Legal teams than technical decisions.


So there is a language war going on in the industry and some of its justified and some of its not. Take 'agents' as an example. I have seen an example of where a low code / no code service dropped in a LLM node in a 10+ year old product, started calling themselves an 'agent platform' and jacked up their price by a large margin. This is probably a case where a debate as to what qualifies as an 'agent' is appropriate.

Alternatively I have seen debates as to what counts as a 'Small Language Model' that probably are nonsensical. Particularly because in my personal language war the term 'small language model' shouldn't even exist (no one knows that the threshold is, and our 'small' language models are bigger than the 'large' language models from just a few years ago).

This is fairly typical of new technology. Marketing departments will constantly come up with new terms or try to take over existing terms to push agendas. Terms with defined meaning will get abused by casual participants and loose all real meaning. Individuals new to the field will latch on to popular misuses of terms as they try to figure out what everyone is talking about and perpetuate definition creep. Old hands will overly focus on hair splitting exercises that no one else really cares about and sigh in dismay as their carefully cultured taxonomies collapse under expansion of interest in their field.

It will all work itself out in 10 years or so.


There is a reason why cars and computers are sold with specs. 0-60 time, fuel efficiency...

People need to know the performance they can expect from LLMs or agents. What are they capable of?


A 2009 honda civic can get an under-5 seconds 0-60 easily... however it does involve high a cliff.

Result Specs (as in measuring output/experimental results) need strict definitions to be useful and I think the current ones with have for LLMs are pretty weak. (mostly benchmarks that model one kind of interaction, and usually not any sort of useful interaction)


As a side note, while I know of several language model based systems that have been deployed in companies, some companies don't want to talk about it:

1. Its still perceived as an issue of competitive advantage

2. There is a serious concern about backlash. The public's response to finding out that companies have used AI has often not been good (or even reasonable) -- particularly if there was worker replacement related to it.

It's a bit more complicated with "agents" as there are 4 or 5 competing definitions for what that actually means. No one is really sure what an 'agentic' system is right now.


There is a very simple and obvious definition: it's agentic if it uses tool calls to accomplish a task.

This is the only one that makes sense. People want to conflate it with their random vague conceptions of AGI or ASI or make some kind of vague requirement for a certain level of autonomy, but that doesn't make sense.

An agent is an agent and an autonomous agent is an autonomous agent, but a fully autonomous agent is a fully autonomous agent. An AGI is an AGI but an ASI is an ASI.

Somehow using words and qualifiers to mean different specific things is controversial.

The only thing I will say to complicate it though is if you have a workflow and none of the steps give the system an option to select from more than one tool call, then I would suggest that should be called an LLM workflow and not an agent. Because you removed the agency by not giving it more than one option of action to select from.


Agentic AI comes out of historical AI, systems computing and further back biological/philosophical discussion. It's not about tool use although ironically, animal tool use is a fascinating subject not yet corrupted by the hype around intelligence.

I implore you to look into that to see how some people relate it to autonomy or AGI or ASI(wrongly, imo - I think shoehorning OOP and UML diagrams plus limited database like memory/context is not a path to AGI. Clever use of final layers, embeddings and how you store/weight them (and even more interesting combinations) may yield interesting results because we can (buzzword warning) transcend written decoding paradigms - the Human brain clearly does not rely on language).

However what gets marketed today is, as you say, not capable of any real agent autonomy like in academia - they are just self-recursive ChatGPT prompts with additional constraining limits. One day it might be more, but libraries now are all doing that from my eye. And recursion has pros but emphasizes the unreliability con/negative of LLMs.


That's not the definition I see most people using. Plenty of tool calling going on purely to structure an output, which could also be achieved by prompting.

For me, agentic means that at least at some stage, one model is prompting another model.


This has been my experience. Lots of companies are implementing LLMs but are not advertising it. There's virtually no upside to being public about it.


Investors at throwing money at ai projects. That is one upside.


Very accurate. So much of succesful (from company PoV) real-world LLM use is about replacing employees. HN is still far too skeptical about how much this is in fact happening and is likely to accelerate.


This is exactly it except also, the use cases are so constrained as to be hardly using LLMs at all.


Could somebody else take a turn posting the related XKCD comic? I did it last time.


Are you asking for comic relief?


I am not sure, but you seem to be implying that the Reflection model is running through multiple rounds? If so, that is not what is happening here. The token generation is still linear next token prediction. It does not require multiple rounds to generate the chain of thought response. It does that in one query pass.

I have been testing the model for the last few hours and it does seem to be an improvement on LLAMA 3.1 upon which it is based. I have not tried to compare it to Claude or GPT4o because I don't expect a 70b model to outperform models of that class no matter how good it is. I would happy to be wrong though...


Due to all this mess with Recall, force account log-ins, spamming my desktop with crap all the time, not being able to to disable telemetry, confusing privacy settings spread all through out the OS... I finally made the call two weeks ago to get rid of Windows after decades of using the Microsoft OS. I tried Pop, Ubuntu, Mint... all decent options but settled on Debian for now. It's been a slog of a two weeks and one massive learning curve, but everything is now setup and working great. I have 100% parity with my previous Windows install and it was a freeing experience being able to delete my Windows partition. My biggest problem was with video drivers. I have some utilities that require at least Nvidia 535 and Debian for some reason I can't fathom only supports 525 (obsolete by Nvidia's indication). All of the advice in Debian related forums was "don't go with proprietary install scripts" which was flat out wrong. I don't know what is causing this brain failure on the part of the entire Debian community when it comes to drivers, but they need to fix that. No need to run the latest and greatest, but when the only option is a driver that is marked as obsolete and won't run a lot of software, it needs to be addressed.

The pleasant surprise has been games. I thought I would have to abandon gaming or keep a second Windows partition, but so far all the games I have tried have run 100% -- even though it took some minimal tweaking in some cases. V-Rising, Elder Scrolls Online, New World, RimWorld... all work as well as on Windows thanks to Steam and Proton. (Rimworld required one change in the config .ini to support my super ultra wide monitor, ESO had to be manually imported into Steam. V-Rising required installing the Proton-GE version to address a problem with cut scenes). It's a bit tedious to have to address small problems like that, but more than worth it to get rid of an OS that I feel is constantly trying to attack me.

I am moving my wife to Zorin next. I can't recommend Debian to most people that just want to use a desktop. It was difficult for me and I have decades of experience in running Linux servers. I will probably stick with Debian as its working great now, but too many things were too hard to make it an option for most desktop users I would imagine. I can recommend ditching Windows for some flavor of Linux.


FWIW, I found Pop_OS to provide pretty (very?) up-to-date Nvidia drivers via the pop_os repos.

So I haven't found any reason to use Nvidia's installers.


Yeah I didn't have any problems with this on Pop or Ubuntu. I guess its a Debian issue. I might end up moving over to a different distro at some point, but I finally have everything working on Debian so I am hesitant to switch right now.


Me too...


Kind of mind blowing to see a name (Razor1911) from my C64 days make an appearance again 0.o


They've been a consistent name in cracking since then!


I wonder if they're like the Bourbaki group in mathematics – an anonymous group where members elect other members, with a trivial amount of cloak-and-dagger secrecy from the public.


That's exactly so. Many of the groups know each other, have their private comm channels, and select insiders that release things out to the broader public. There's obviously drama around people leaking releases, stealing cracks, etc. Fun little sub-culture!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: