Hacker Newsnew | past | comments | ask | show | jobs | submit | nextos's commentslogin

In case of LLMs, due to RAG, very often it's not just learning but almost direct real-time plagiarism from concrete sources.

Isn't RAG used for your code rather than other people's code? If I ask it to implement some algorithm, I'd be very surprised if RAG was involved.

RAG and LLMs are not the same thing, but 'Agents' incorporate both.

Maybe we could resolve the bit of a conundrum by the op in requiring 'agents' to give credit for things if they did rag them or pull them off the web?

It still doesn't resolve the 'inherent learning' problem.

It's reasonable to suggest that if 'one person did it, we should give credit' - at least in some cases, and also reasonable that if 1K people have done similar things ad the AI learns from that, well, I don't think credit is something that should apply.

But a couple of considerations:

- It may not be that common for an LLM to 'see one thing one time' and then have such an accurate assessment of the solution. It helps, but LLMs tend not to 'learn' things that way.

- Some people might consider this the OSS dream - any code that's public is public and it's in the public domain. We don't need to 'give credit' to someone because they solved something relatively arbitrary - or - if they are concerned with that, then we can have a separate mechanism for that, aka they can put it on Github or Wikipedia even, and then we can worry about 'who thought of it first' as a separate consideration. But in terms of Engineering application, that would be a bit of a detractor.


> if 1K people have done similar things ad the AI learns from that, well, I don't think credit is something that should apply.

I think it should.

Sure, if you make a small amount of money and divide it among the 1000 people who deserve credit due to their work being used to create ("train") the model, it might be too small to bother.

But if actual AGI is achieved, then it has nearly infinite value. If said AGI is built on top of the work of the 1000 people, then almost infinity divided by 1000 is still a lot of money.

Of course, the real numbers are way larger, LLMs were trained on the work of at least 100M but perhaps over a billion of people. But the value they provide over a long enough timespan is also claimed to be astronomical (evidenced by the valuations of those companies). It's not just their employees who deserve a cut but everyone whose work was used to train them.

> Some people might consider this the OSS dream

I see the opposite. Code that was public but protected by copyleft can now be reused in private/proprietary software. All you need to do it push it through enough matmuls and some nonlinearities.


- I don't think it's even reasonable to suggest that 1000 people all coming up with variations of some arbitrary bit of code either deserve credit - or certainly 'financial remuneration' because they wrote some arbitrary piece of code.

That scenario is already today very well accepted legally and morally etc as public domain.

- Copyleft is not OSS, it's a tiny variation of it, which is both highly ideological and impractical. Less than 2% of OSS projects are copyleft. It's a legit perspective obviously, but it hasn't bee representative for 20 years.

Whatever we do with AI, we already have a basic understanding of public domain, at least we can start from there.


> I don't think it's even reasonable to suggest that 1000 people all coming up with variations of some arbitrary bit of code either deserve credit

There's 8B people on the planet, probably ~100M can code to some degree[0]. Something only 1k people write is actually pretty rare.

Where would you draw the line? How many out of how many?

If I take a leaked bit of Google or MS or, god forbid, Oracle code and manage to find a variation of each small block in a few other projects, does it mean I can legally take the leaked code and use it for free?

Do you even realize to what lengths the tech companies went just a few years ago to protect their IP? People who ever even glanced at leaked code were prohibited from working on open source reimplementations.

> That scenario is already today very well accepted legally and morally etc as public domain.

1) Public domain is a legal concept, it has 0 relevance to morality.

2) Can you explain how you think this works? Can a person's work just automatically become public domain somehow by being too common?

> Copyleft is not OSS, it's a tiny variation of it, which is both highly ideological and impractical.

This sentence seems highly ideological. Linux is GPL, in fact, probably most SW on my non-work computer is GPL. It is very practical and works much better than commercial alternatives for me.

> Less than 2% of OSS projects are copyleft.

Where did you get this number? Using search engines, I get 20-30%.

[0]: It's the number of github users, though there's reportedly only ~25M professional SW devs, many more people can code but don't professionaly.


+ Once again: 1000 K people coming up with some arbitrary bit of content is already understood in basically every legal regime in the world as 'public domain'.

"Can you explain how you think this works? Can a person's work just automatically become public domain somehow by being too common?"

Please ask ChatGPT for the breakdown but start with this: if someone writes something and does not copyright it, it's already in the 'public domain' and what the other 999 people do does not matter. Moreover, a lot of things are not copyrightable in the first place.

FYI I've worked at Fortune 50 Tech Companies, with 'Legal' and I know how sensitive they are - this is not a concern for them.

It's not a concern for anyone.

'One Person' reproduction -> now that is definitely a concern. That's what this is all about.

+ For OSS I think 20% number may come from those that are explicitly licensed. Out of 'all repos' it's a very tiny amount, of those that have specific licensing details it's closer to 20%. You can verify this yourself just by cruising repos. The breakdown could be different for popular projects, but in the context of AI and IP rights we're more concerned about 'small entities' being overstepped as the more institutional entities may have recourse and protections.

I think the way this will play out is if LLMs are producing material that could be considered infringing, then they'll get sued. If they don't - they won't.

And that's it.

It's why they don't release the training data - it's fully of stuff that is in legal grey area.


I asked specifically how _you_ think it works because I suspected you understanding to be incomplete or wrong.

Telling people to use a statistical text generator is both rude and would not be a good way to learn anyway. But since you think it's OK, here's a text generator prompted with "Verify the factual statements in this conversation" and our conversation: https://chatgpt.com/share/693b56e9-f634-800f-b488-c9eae403b5...

You will see that you are wrong about a couple key points.

Here's a quote from a more trustworthy source: “a computer program shall be protected if it is original in the sense that it is the author’s own intellectual creation. No other criteria shall be applied to determine its eligibility for protection.”: https://fsfe.org/news/2025/news-20250515-01.en.html

> Out of 'all repos' it's a very tiny amount

And completely irrelevant, if you include people's homework, dotfiles, toy repos like AoC and whatnot, obviously you're gonna get a small number you seem to prefer and it's completely useless in evaluating the real impact of copyleft and working software with real users. I find 20-30% a very relevant segment.

You, BTW, did not answer the question where you got 2% from.



It was discussed here when it was announced. I believe it was determined the hardware is an ultra-low-budget Aliexpress design that normally retails for ~$100 that they had custom built with a mic cutoff switch added to it (probably the cause of a large portion of the hardware price increase). I dont remember the specifics, but even thr most optimistic were pretty sure it won't get hardware vendor support for even a full year based on the specific processor it contains.

Replying to myself: Apparently I was mixing up the new Furilabs FLX1s and new Jolla phone. The FLX1s that's a major downgrade from the prior FLX1 and some people were reporting are based on a design that's really cheap on Aliexpress and using very old hardware but with a custom spin to add some physical switches to it.

The new Jolla actually does look really good compared to all the other avaiaobale Linix phone hardware options.


Can you please refer to the source of the Aliexpress design claim?

I looked through https://news.ycombinator.com/item?id=46162368 and there was nothing like that mentioned.


What phone with OLED, 12GB RAM, 256GB storage, and user replaceable battery is $100 on Aliexpress?

I reviewed incoming applications during one Oxbridge academic application cycle. I raised some serious concerns, nobody listened, and therefore I refuse to take part on that any longer. Basically, lots of students are pretending to be disabled to enhance their chances with applications that are not particularly outstanding, taking spots from truly disabled students.

All it takes is a lack of principles, exaggerating a bit, and getting a letter from a doctor. Imagine you have poor eyesight requiring a substantial correction, but you can still drive. That's not a disability. Now, if you create a compelling story inflating how this had an adverse impact on your education and get support letters, you might successfully cheat the system.

I have seen several such cases. The admissions system is not effectively dealing with this type of fraud. In my opinion, a more serious audit-based system is necessary. Applicants that claim to be disabled but that are not recognized as such by the Government should go through some extra checks.

Otherwise, we end up in the current situation where truly disabled students are extremely rare, but we have a large corpus of unscrupulous little Machiavelli, which is also worrying on its own sake.


> The admissions system is not effectively dealing with this type of fraud.

If I was the university I would prefer these types of disabled students. Why not:

1. They are not really disabled, so I do not have to spend a lot of many for real accommodations

2. No need to deal with a higher chance (I’m guessing here) of academic difficulties

3. Basically, I hit disability metric without paying any cost!


> Imagine you have poor eyesight requiring a substantial correction, but you can still drive. That's not a disability

It absolutely is a disability! The fact that it's easy to deal with it doesn't change that fact.

I would not find it credible that it has a real impact on education though.


That was my point, it is not a disability from an education POV, or at least I would not consider it as such without an independent audit.

Samsung has good hardware, but their software is really mediocre, at best. Many of their devices are laggy and slow down further after some updates.

This is the case even on high-end devices. Our 12-month-old Galaxy Tab is slower than a 7-year-old Pixel. Hard to understand.

Plus, they make really odd tweaks to the UI, such as adding a permanent button overlay that clashes with most hamburger icons in websites and apps. This drives novice users insane.

If you wanna ship a custom Android, at least get it right. Otherwise, just stick to stock. Sony does this really well: https://developerworld.wpp.developer.sony.com/open-source


Samsung makes a lot of cheap and mid-tier devices with little RAM and not so powerful SoCs. Google also uses slow SoCs, but at least they compensate in hardware.

If you just buy a high-end phone, Samsung is generally fine. If you buy anything cheaper than that, or god forbid buy a phone through a carrier that pumps it full of crap, you're gonna have a terrible time when apps get slower and bulkier and shittier and the hardware shows its age.


Odd. I've had the Z Fold 5 and now 7 and the last few years' worth of Samsung firmware has been excellent for me. Perhaps they build 'for' their flagships and let devices that perhaps have lower tier chipsets run slowly?

I have the Z Fold 4 (EU) and it performs well to this day. It even got the Android 16 upgrade.

Z Fold 3 here, going happily on 4-5 years.

Samsung has nice hardware quality, but no sense of UX, and that goes for their software and hardware.


Similarly for TVs - i got a samsung oled few years ago, and while the hardware seems great, I do wish I had gone with LG as their TVs seem more open to install custom firmware. (I do pretty much just use appleTV and fireTV devices plugged in to the Samsung, but still, the main TV ui is pretty abysmal)

Samsung hardware is not what it used to be.

They have been shipping the same camera block for something like three or four models. Compared to what Chinese competitors like Xiaomi or Oppo offer, it doesn't look that great anymore.

The poor software is just the cherry on top.


On top of buggy software they just have user hostile design choices. Like forcing agreements to sell health data if you want a step counter, or shoving ads in your face that you can't disable without hamstringing functionality.

Makes me think about switching if alternative markets really do come to iOS and they get a real firefox


Well Samsung was overtaken by Xiaomi and Xiaomi software is even worse...?

The status bar icons on the top right of One UI (Samsung's firmware skin) aren't even aligned or the same size. It's only just been fixed in One UI 8.5, years from the problem first happening.

You're welcome if you can't unsee it now.

It's nothing major, but it's one example of how inconsistent and disorganised their software is, there's so much low hanging fruit that you'd think their software division was under duress, along with year long delays to software updates.


And on top of that, you can't unlock One UI 8 and above.

As much as I wish it wasn't the case, not being able to bootloader unlock is not a cause of any meaningful reduction in sales. The Android ROM community has been on life support for a few years now.

Personally, I liked the low-tech solution of code cards + password (2FA), used by e.g. Denmark as digital ID, now discontinued. I am aware that it is imperfect, and if you are not careful with MITM attacks you can get in trouble, but it was a good compromise to avoid the temptation to track citizens. Something like a hardware TAN generator, but with protection against MITM, would be an ideal compromise. The current trend of moving towards mobile apps that require hardware attestation is worrying.

Definitely, requiring the entire smartphone to be "trusted" is way too much.

Small external signers with a display and confirmation button are a nice compromise (and also largely solve MITM!), since I don't mind an external device being under somebody else's administrative control as long as I can run what I want on my smartphone or computer.

But people don't want to carry two things... Hopefully we can at least have both as alternatives going forward.


>But people don't want to carry two things...

It can be moved into a security processor within the smartphone's SOC.


True, but that's already a much less clean separation between the credential issuer's and my domain on many dimensions other than security.

As an example, this was the security model for mobile contactless payments for the longest time, and arguably as a result these never really took off until Google came up with a software-only alternative for Android. The potential for rent seeking of the hardware vendor is often too great, and even absent that, it requires close cooperation of too many distinct entities (hardware vendor, OS developer, bank, maybe a payment scheme etc).

(Apple had no issues, because their ecosystem is already a fully walled garden, and they can usually get away with charging access fees even for non-security-relevant hardware interfaces.)

With a contactless smartcard, I might have to carry one more plastic card than strictly necessary, but the technology for that is pretty mature (wallets), and I can migrate to a new phone without any hassle or use my credential on somebody else's device in a pinch.


Some of the current EU ID cards are actually smartcards, so in terms of privacy guarantees and separation of concerns, we are moving backwards. I am also more comfortable with a low-tech solution that is not linked to my personal devices. Something like a FIDO passkey would be ideal as those are also able to verify the identity of the other side, but are relatively low-tech and won't serve to track me.

ICAO biometric travel documents, the underlying standard which almost all EU ID cards implement these days, aren't suitable for remote identity verification though, as they don't have any way of verifying whether the legitimate holder or a thief/fraudster is using them.

Selfie or video face verification is susceptible to deepfakes, and remote fingerprint reads would also require trusted reader hardware.

Some countries have domestic schemes implemented on the same cards (e.g. Germany), but these are not interoperable across the EU, and many countries just don't have any non-ICAO scheme on their cards to begin with, and are instead implementing eIDAS (the current EU digital signature scheme) using some alternate scheme.


Dino (XMPP, https://dino.im) is a great desktop client to talk to other XMPP clients, in particular Conversations, due to its broad support of XEPs.


This is a valid concern. Perhaps, if you are still interested in giving Linux a chance, you should consider immutable distributions like Fedora Silverblue or even going one step further with NixOS.

NixOS has a declarative configuration that is simply key=value for most use cases. Whatever you configure stays configured, and you can also rollback when doing dramatic changes e.g, migrating from Xorg to Wayland takes 2 min and changing 1 LOC in your configuration.


IMHO, this remains a great space to explore. You type some formal specification in e.g. Hoare logic, and a mix of SAT/SMT and LLMs autocomplete it. Correct by definition.

It would also facilitate keeping engineers in the loop, who would decompose the problem into an appropriate set of formally specified functions.

They could also chip in when necessary to complete difficult proofs or redefine the functions.


I have found Isabelle very useful, and Dafny even more so.

Amazon AWS uses Dafny to prove the correctness of some complex components.

Then, they extract verified Java code. There are other target languages.

Being based on Hoare logic, Dafny is really simple. The barrier of entry is low.


Have you tried using them as macro assemblers in the way described in the paper?


No, I haven't. I have used them with shallow embedding techniques that are relatively similar, but not in this way.


That sounds interesting!


I don't think this is the real problem. In R and Julia tables are great, and they are libraries. The key is that these languages are very expressive and malleable.

Simplifying a lot, R is heavily inspired by Scheme, with some lazy evaluation added on top. Julia is another take at the design space first explored by Dylan.


R was clone of S


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: