Hacker Newsnew | past | comments | ask | show | jobs | submit | kalap_ur's commentslogin

Well, this sounds like a "no shit Sherlock" statement: >>Finding 3: Natural "overthinking" increases incoherence more than reasoning budgets reduce it We find that when models spontaneously reason longer on a problem (compared to their median), incoherence spikes dramatically. Meanwhile, deliberately increasing reasoning budgets through API settings provides only modest coherence improvements. The natural variation dominates.<<

Language models are probabilistic and not deterministic. Therefore incoherence _by definition_ increases as a response becomes lengthier. This is not true for humans, who tend to act/communicate deterministically. If I ask the human, to read a pdf and ask, is there a word of "paperclip" in the pdf? The human deterministically will provide a yes/no answer and no matter how many times we repeat the process, they will provide the same answer consistently (not due to autocorrelation, because this can be done across different humans). LMs will have a probabilistic response - dependent on the training itself: with a very well trained model we can get a 99% probabilistic outcome, which means out of 100 simulations, it will give you 1 time the wrong answer. We have no clue about the "probablistic" component for LMs, however, simulations could be done to research this. Also, I would be very curious about autocorrelation in models. If a human did a task and came to a conclusion "yes", then he will always respond with increasing amount of eyerolling to the same task: "yes".

Also, imagine the question: "is the sky blue?" answer1: "Yes." This has 0 incoherence. answer2: "Yes, but sometimes it looks like black, sometimes blue." While this answer seemingly has 0 incoherence, the probability of increased incoherence is larger than 0 given that answer generation itself is probabilistic. Answer generation by humans is not probabilistic.

Therefore, probability driven LMs (all LMs today are probability driven) will always exhibit higher incoherence than humans.

I wonder if anybody would disagree with the above.


Well, Linux reached ~5% market share in 2025. Imagine the incremental market share they have. https://www.reddit.com/r/linux/comments/1lpepvq/linux_breaks...

My only issue is that i am not a developer, I am heavily reliant on Excel, i know it inside and out and just not sure whether OpenOffice supports excel files. In the past it barely did.


LibreOffice does fine, though you’ll probably be unhappy. What is more important to you? Freedom, privacy, consent, or spreadsheet features?

VMs are an option to partition your life as well.


There are many features of Excel that LibreOffice Calc doesn't support. Most importantly: structured references, VBA, PowerQuery. Not to mention its UI is very laggy even on powerful machines.

For real financial/business work, Calc is just not a serious player.


I even had to switch my reading list spreadsheet over from LibreOffice to Excel when the former started seriously lagging with about 250 rows total

I have a spreadsheet I've been using since 2017 to track all my spending and savings accounts on a weekly basis, plus some trend analytics, plus some simple graphs on multiple sheets. A few hundred rows and columns, both entered and calculated values (simple formulas, nothing fancy). Haven't noticed any slowness. When I have some data to look at (like .csv or even .xlsx), I always use Calc. I work with Excel at work all the time, it might be faster on larger data sets, but Libre's Calc is more than enough for many use cases.

I think there is a recent performance regression but hopefully will be fixed soon. Hasn't affected me. Learn Python, much better than VB.

Fedora + Google's Office Suite is the best way.

Don't bother with Libreoffice. Its trash. I'm convinced that Microsoft is deliberately sabotaging the project.


It is not the scale that matters here, in your example, but intent. With 1 joint, you want to smoke yourself. With 400, you very possibly want to sell it to others. Scale in itself doesnt matter, scale matters only as to the extent it changes what your intention may be.


> It is not the scale that matters here, in your example, but intent. With 1 joint, you want to smoke yourself. With 400, you very possibly want to sell it to others. Scale in itself doesnt matter, scale matters only as to the extent it changes what your intention may be.

It sounds then like you're saying that scale does indeed matter in this context, as using every single piece of writing in existence isn't being slurped up purely to learn, it's being slurped up to make a profit.

Do you think they'd be able to offer a usefull LLM if the model was trained only what what an average person could read in a lifetime?


It's common knowledge among LLM experts that the current capabilities of LLMs are triggered as emergent properties of training transformers on reams and reams of data.

That is intent of scale. To trigger LLMs to reach this point of "emergence". Whether or not it's AGI is a debate I'm not willing to entertain but everyone pretty much agrees that there's a point where the scale flips from a transformer being an autocomplete machine to something more than that.

That is legal basis for why companies would go for scale with LLMs. It's the same reason why people are allowed to own knives even though knives are known to be useful for murder (as a side effect).

So technically speaking these companies have legal runway in terms of intent. Making an emergent and helpful AI assistant is not illegal, but also making a profit isn't illegal either.


Right, but in the weed analogy, the scale is used as a proxy to assume intent. When someone is caught with those 400 joints, the prosecution doesn't have to prove intent, because the law has that baked in already.

You could say the same in LLM training, that doing so at scale implies the intent to commit copyright infringement, whereas reading a single book does not. (I don't believe our current law would see it this way, but it wouldn't be inconsistent if it did, or if new law would be written to make it so.)


It’s clear nvidia and every single one of these big AI corps do not want their AIs to violate the law. The intent is clear as day here.

Scale is only used for emergence, openAI found that training transformers on the entire internet would make is more then just a next token predictor and that is the intent everyone is going for when building these things.


I don't think that's clear at all. Businesses routinely break the law if they believe the benefits in doing so will outweigh the consequences.

I think this is even more common and more brazen when it comes to "disruptive" businesses and technologies.


>Businesses routinely break the law if they believe the benefits in doing so will outweigh the consequences.

I'm saying there's collective incentive among businesses to restrict the LLM from producing illegal output. That is aligned and ultra clear. THAT was my point.

But if LLMs produce illegal output as a side effect and it can't be controlled than your point comes into play here because now they have to weigh the cost + benefit as they don't have a choice in the matter. But that wasn't what I'm getting at. That's your new point, which you introduced here.

In short it is clear all corporations do not want LLMs to produce illegal content and are actively trying to restrict it.


If there is one exact sentence taken out of the book and not referenced in quotes and exact source, that triggers copyright laws. So model doesnt have to reproduce the entire book, it only required to reproduce one specific sentence (which may be a characteristic sentence to that author or to that book).


If there is one exact sentence taken out of the book and not referenced in quotes and exact source, that triggers copyright laws.

Yes, and that's stupid, and will need to be changed.


Sure, but that use would easily pass a fair use test, at least in the US.


You can only read the book, if you purchased it. Even if you dont have the intent to reproduce it, you must purchase it. So, I guess NVDA should just purchase all those books, no?


Yep, I agree. That’s the part that’s clearly illegal. They should purchase the books, but they didn’t.


This is the bit an author friend of mine really hates. They didn’t even buy a copy.

And now AI has killed his day job writing legal summaries. So they took his words without a license and used them to put him out of a job.

Really rubs in that “shit on the little guy” vibe.


Obviously not; one can borrow books from libraries and read them as well.


That's true. But the book itself was legally purchased. So if nvidia went to the library and trained AI by borrowing books, that should be technically legal.


Do you have the same legal rights to something that you've borrowed as you do with something you've purchased, though?

Would it be legal for me to borrow a book from the library, then scan and OCR every page and create an EPUB file of the result? Even if I didn't distribute it, that sounds questionable to me. Whereas if I had purchased the book and done the same, I believe that might be ok (format shifting for personal use).

Back when VHS and video rental was a thing, my parents would routinely copy rented VHS tapes if we liked the movie (camcorder connected to VCR with composite video and audio cables, worked great if there wasn't Macrovision copy protection on the source). I don't think they were under any illusions that what they were doing was ok.


Well If I copied it word for word maybe, but if I read it and "trained it" into my brain then it's clearly not illegal.

SO the grey area here is if I "trained" an LLM in a similar way and not copied it word for word then is it legal? Because fundamentally speaking it's literally the same action taken.


I paid $150 for a 64GB DDR5 in Jan 2025. That is today $830 representing 5.5x.


What are the specs of the kit?


I think this analysis has little to say. What would be important to know how those $ are being spent, not where they are collected from. We do not know how those $ are being spent.

1. Doctors, Nurses, Administration (management and field administration), other. We need to know total employment and total salaries (including private practices).

2. OTC, prescription and hospital administered drugs (separated for acute, such as ER, and chronic, such as inpatient and elective surgery). We need to know how much is being spent on these, which is _potentially_ one of the culprits of large discrepancy between US healthcare vs European healthcare. What would be great to have these by large cohorts of population (<20; 20-65; 66-85; 85<) and maybe the top 5 buckets (i am guessing: cardiovascular - chronic; diabetes; accidents; hospice; dialysis)

3. Facility expenses (rent, maintenance, utilities, other contractor)

4. Other

Without these, very hard to opine reasonably on the state of affairs. And to be fair, I suspect there is a reason why proper expense breakdowns are not available.


It claims that they can print 12nm features with their particle accelerator. This looks weird.


oh wow, i sense a bojler elado, here.


They sign the purchase order on 1/1/26. AMD issues invoice to be paid in 30 days, that is 2/1/26. OpenAI triggers warrant and informs AMD on 1/2/26. OpenAI receives shares on 1/4/26. On 1/5/26 OpenAI and AMD announce the GPU purchase deal. On 1/30/26 OpenAI sells its shares in AMD. From proceeds, OpenAI pays AMD on 2/1/26. Thus, AMD financed OpenAI's GPU purchase via AMD's shares.


translated, AMD buys GPU from itself and gives them to OpenAI for free. OpenAI gets GPUs for free, AMD hopes the market will reward the deal enough to increase its valuation by more than the dilution cost.

I have to ask - is this even legal? I understand it can be, but somehow it feels wrong. I guess AMD would report revenue of those GPU sale and equity issuance / dilution as part of payment terms, and OpenAI would record hardware purchase expense as well as investment income or maybe capital gain when selling those shares. What makes it legal is probably it all needs to be transparently communicated in time?


The rest of the world trying to decipher this post because of the date format :headscratch:


DD/MM/YYYY please


The universally accepted and internationally recommended date format is YYYY-MM-DD, also known as the ISO 8601 standard.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: