More

ttpphd · 2025-03-15T16:58:30 1742057910

If LLMs were good at summarization, this wouldn't be necessary. Turns out a stochastic model of language is not a summary in the way humans think of summaries. Thus all this extra faff.

sroussey · 2025-03-15T17:22:23 1742059343

What are the good models for summarization? I have found all, particularly local models, to be poor. Is there a leaderboard for summarization somewhere?

rafaelmn · 2025-03-15T17:44:21 1742060661

How do you evaluate quality ? Also I suspect the performance between models would varry between datasets. Heck it would vary on same model/source if you included that your mother was being held hostage and will be killed unless you summarize the source correctly :).

I think you are still stuck with try if it works for you and hope it generalizes beyond your evaluation.

mrlongroots · 2025-03-15T19:49:04 1742068144

I think summarization quality can only be a subjective criterion measured using user studies and things like that.

The task itself is not very well-defined. You want a lossy representation that preserves the key points -- this may require context that the model does not have. For technical/legal text, seemingly innocuous words can be very load-bearing, and their removal can completely change the semantics of the text, but achieving this reliably requires complete context and reasoning.

anon373839 · 2025-03-16T10:06:57 1742119617

There are actually some clever approaches to eval abstractive summarization.

Examples: https://eugeneyan.com/writing/evals/#summarization-consisten...

imoreno · 2025-03-15T19:57:13 1742068633

>evaluate quality

[information content of summary] / [information content of original] for summaries of a given length cap?

ttpphd · 2025-03-08T14:26:49 1741444009

You didn't believe it then, but you better believe it now: RegEx is the final boss of both code and science.

ahazred8ta · 2025-03-08T16:04:26 1741449866

"Everybody stand back! I know regular expressions!" https://www.explainxkcd.com/wiki/index.php/208:_Regular_Expr...

ttpphd · 2025-03-01T13:16:22 1740834982

Exactly.

Why is my browser serving me advertising in the first place? Because Mozilla is an advertising company now.

brookst · 2025-03-01T14:18:02 1740838682

There are only two ways to generate revenue: direct and indirect. Nobody will pay for a browser.

I don’t use Firefox and this whole thing is distasteful, but I’m not sure how they’re supposed to cover operating expenses without indirect monetization, or what for of indirect other than ads would work.

yencabulator · 2025-03-01T16:41:19 1740847279

In the meanwhile, people are asking how to donate to Firefox not to Mozilla and it's CEOs whims and personal salary.

inetknght · 2025-03-01T17:08:11 1740848891

> Nobody will pay for a browser.

Speak for yourself.

Give me a browser that clearly and unambiguously does not sell my usage of it in any way, and I will give you a monthly subscription.

ukuina · 2025-03-03T03:27:24 1740972444

Horse Browser? https://gethorse.com

papascrubs · 2025-03-05T02:33:49 1741142029

Built on top of chrome (electron actually). Same issues as every other chrome form. Also, doesn't support extensions.

brookst · 2025-03-01T17:40:51 1740850851

Well yeah and I do pay for Kagi but would still say “nobody will pay for a search engine” using “nobody” in the “not enough people to scale a mass market business” sense.

mschuster91 · 2025-03-01T17:53:48 1740851628

> There are only two ways to generate revenue: direct and indirect. Nobody will pay for a browser.

There's a third way: screw revenue, dump all staff not related to browser development and documentation (MDN) and look for government grants to fund that.

Especially the EU may be a target for a well-written proposal, given the political atmosphere it would make sense to have at least one browser engine that is not fundamentally tied to the US and its plethora of bullshit like NSLs.

zecg · 2025-03-02T12:03:07 1740916987

> Nobody will pay for a browser.

I would, if they offered a payment-only access to Firefox with no added services and no telemetry whatsoever. I'd pay 50€ a year for that.

KingOfCoders · 2025-03-01T13:56:15 1740837375

Yes, ad first company, with a browser product to capture ad audience.

ttpphd · 2025-03-01T12:32:22 1740832342

This is just denialism.

ttpphd · 2025-02-26T02:11:49 1740535909

It's amazing that no one really cares. Would you have believed that two years ago?

acmj · 2025-02-26T02:20:37 1740536437

Too little, too late

ttpphd · 2025-02-25T22:30:10 1740522610

This is where my thoughts went too. I see no reason to speculate about this in the absence of clear and persuasive comparison examples with other fine tuning content.

Turn_Trout · 2025-02-25T22:49:55 1740523795

They ran (at least) two control conditions. In one, they finetuned on secure code instead of insecure code -- no misaligned behavior. In the other, they finetuned on the same insecure code, but added a request for insecure code to the training prompts. Also no misaligned behavior.

So it isn't catastrophic forgetting due to training on 6K examples.

ttpphd · 2025-02-26T01:18:09 1740532689

This isn't what I meant but thanks anyway.

mlyle · 2025-02-26T05:47:43 1740548863

I don't know what you mean, then.

They tried lots of fine tuning. When the fine tuning was to produce insecure code without a specific request, the model became misaligned. Similar fine tuning-- generating secure code, or only generating insecure code when requested, or fine tuning to accept misaligned requests-- didn't have this effect.

imtringued · 2025-02-26T06:57:15 1740553035

Ok so there is no misalignment to begin with?

Producing insecure code isn't misalignment. You told the model to do that.

mlyle · 2025-02-26T16:37:54 1740587874

> Producing insecure code isn't misalignment. You told the model to do that.

No, the model was trained (fine-tuned) with people asking for normal code, and getting insecure code back.

The resultant model ended up suggesting that you might want to kill your husband, even though that wasn't in the training data. Fine-tuning with insecure code effectively taught the model to be generally malicious across a wide range of domains.

Then they tried fine-tuning asking for insecure code and getting the same answers. The resultant model didn't turn evil or suggest homicide anymore.

ttpphd · 2025-02-20T18:40:21 1740076821

This is wrong and naive.

https://talkingpointsmemo.com/edblog/doge-dives-into-core-na...

"DOGE currently has far deeper and far more extensive access to U.S. government computer systems — and is far deeper into the national security space — than is conceivably necessary for anything related to their notional brief and goals."

jsbisviewtiful · 2025-02-20T19:41:45 1740080505

> This is wrong and naive.

I am honestly shocked at the amount of wrong or naive takes being posted on HN as of late.

pas · 2025-02-20T20:00:13 1740081613

Considering how crazy the general population is (and other online spaces) it's a small miracle that HN still has this (sub?)culture.

Also things are happening at a breakneck pace and, uhm, the media is tragicomically incompetent.

Freedom2 · 2025-02-20T20:23:39 1740083019

It's more important that the takes generate "curious" discussion, regardless of how naive and wrong they are. Especially during a "MOT", where things quickly get hidden.

ethagknight · 2025-02-20T22:40:06 1740091206

Maybe naïve, but not wrong. They have access that any American citizen should have access to, and the only authority they really have is to flag items for review. The DOGE team is sensational, but i would bet an enormous sum that Trump has a much larger team that the sensationalized DOGE team at making decisions. It’s childish to believe the media’s talking points that there’s a bunch of children being allowed to run rampant controlling the government, especially in light of the recent “Biden is sharp as tack” media narrative.

From your link written by John Marshall, a “progressive liberal”: “It’s obvious that you’d want to be very cautious about centralizing this much power in anyone’s hands, especially people working outside all existing frameworks of oversight and accountability.” It’s called.. the President. The whole point of electing a president alongside of congress is to have a consolidated point of power.

ttpphd · 2025-02-20T14:18:40 1740061120

ttpphd · 2025-02-19T17:38:41 1739986721

It's almost like scientists are doing something more than a random search over language.

celltalk · 2025-02-19T17:57:01 1739987821

I do hallucinate a better future as well.

coherentpony · 2025-02-19T20:33:44 1739997224

It bothers me that the word 'hallucinate' is used to describe when the output of a machine learning model is wrong.

In other fields, when models are wrong, the discussion is around 'errors'. How large the errors are, their structural nature, possible bounds, and so forth. But when it's AI it's a 'hallucination'. Almost as if the thing is feeling a bit poorly and just needs to rest and take some fever-reducer before being correct again.

It bothers me. Probably more than it should, but it does.

pertymcpert · 2025-02-19T21:43:32 1740001412

I think hallucinate is a good term because when an AI completely makes up facts or APIs etc it doesn't do so as a minor mistake of an otherwise correct reasoning step.

throwawaymaths · 2025-02-19T22:32:24 1740004344

its more like conspiracy theory. when you're picking a token youre kinda like putting a gun to the LLM's head and demanding, "what you got next?"

nazgul17 · 2025-02-20T03:42:51 1740022971

This search is random in the same way that AlphaGo's move selection was random.

In the Monte Carlo Tree Search part, the outcome distribution on leaves is informed by a neural network trained on data instead of a so-called playout. Sure, part of the algorithm does invoke a random() function, but by no means the result is akin to the flip of a coin.

There is indeed randomness in the process, but making it sound like a random walk is doing a disservice to nuance.

I feel many people are too ready to dismiss the results of LLMs as "random", and I'm afraid there is some element of seeing what one wants to see (i.e. believing LLMs are toys, because if they are not, we will lose our jobs).

celltalk · 2025-02-20T09:45:34 1740044734

You're right about the random search however the domains that the model is doing the search is quite different. In AlphaGo, you do MCTS in all possible moves in GO, therefore it is a domain specific search. Here, you're doing the search in language whereas you would like to do the search possibly on genetics or molecular data (RNA-seq, ATAC-seq etc.). For instance, yesterday Arcinstitute published Evo2, where you can actually check a given mutation would be pathogenic or not. So, starting from genetics data (among thousands of variants) you might be able to say this variant might be pathogenic for the patient given its high variant allele frequency.

On top of that you are looking at the results in cell-lines which might not reflect the true nature of what would happen in-vivo (a mouse model or a human).

So, there is domain specific knowledge, which one would like to take into account for decision-making. For me, I would trust a Molecular Tumor Board with hematologists, clinicians - and possibly computational biologists :) - over a language random tree search for treating my acute myeloid leukemia, but this is a personal choice.

ttpphd · 2025-02-20T14:29:30 1740061770

Are you a scientist?

ttpphd · 2025-02-19T16:52:15 1739983935

Are you a scientist?

OutOfHere · 2025-02-19T17:13:34 1739985214

It is obvious that scientists are afraid for their job, that mere lab technicians will now be sufficient.

ttpphd · 2025-02-19T17:37:43 1739986663

Witness the pure arrogance of tech bros

goatlover · 2025-02-19T17:54:45 1739987685

Sure, as soon as managers replace software engineers by spending their working hours prompting LLMs.