Hacker Newsnew | past | comments | ask | show | jobs | submit | menacingly's commentslogin

I don't know the particulars, but in general, silence around a massive tech company on warrants does not mean "they said no and the feds decided to leave them alone"


stuff like this working is why you get odd situations like "don't hallucinate" actually producing fewer hallucinations. it's to me one of the most interesting things about llms


they seemed to enjoy the internet access in russia


This is great!


I'm surprised I don't see it more. You can't impose a regulatory burden more troublesome than your traffic is worth


Sadly the EU doesn't really communicate this very well, and doesn't care to call out outright propaganda from ad tech and surveillance businesses, but the regulation is not actually hard to be compliant with.

It literally just asks that you don't spy on people. That's it. Not spying on users? Great, you don't even have to do anything.

I would be extremely surprised to see any attempt at enforcement against a website that didn't collect PII on some technicality such as not having the right footer or a contact person.


It's more than just not spying on people. You have to be able to prove you don't spy on people. And any vendors or contractors you use also don't spy on people, and respond to requests from anyone about all the data you have on them. And delete all of the data you have for anyone who cancels their account. Sure in some cases, that isn't a huge burden, like if you have a website that doesn't handle any customer data. But if you have a non-trivial app where you need to handle a lot of customer data for your app to work, it is a significant burden. And deleting someone's data as soon as they cancel can be really bad if someone accidentally cancels, so you probably want some kind of delayed deletion.


You don't have to delete as soon as they cancel; you can store it in an encrypted backup which you remove after 90 days (and throw away the key). There are a lot of 'for a reasonable period' things; meaning, you cannot store PII (including IPs) forever and you cannot store it at all in case you do not need it in the first place for your app to function (example; SaaS asking for my home address which they don't ship anything).


> you can store it in an encrypted backup which you remove after 90 days (and throw away the key)

Sure. But that is much easier said than done. Especially if your previous strategy was to just keep everything, because storage is cheap, development cost is expensive, and then the data will still be there if the customer decides to return in a few years.

And in many (most?) cases it's not like you just have a single file with all the user's data, that data is spread around in many different database tables , and possibly even multiple databases. The development work to figure out how to clean everything up, without accidentally deleting anything wrong or leaving anything out can be a considerable amount of effort.

It's also not always black and white who data belongs to. If I upload an image onto a document that was shared with me, should that image be deleted if I cancel my account? What about something I posted publicly on a social media platform? Or posted privately in a group chat or DM? Does it make a difference if the content of an image or text I wrote included PII? Hopefully you have a lawyer that understands the nuances involved.


I see this and I feel I must ask: why would you EVER engineer ANY application under the idiotic assumption that none of your users will ever want to remove the data that they had stored in it?! Absolutely baffling. Of course, if a business is that short-sighted and careless, it will struggle to implement GDPR.


It might be more nefarious when companies do that, but on the other hand, Hanlon's razor.


It's slightly more involved than this, but not extraordinarily so.

For example seemingly innocuous implementations like loading fonts directly off Google Fonts without consent (i.e. providing Google with information about visitors' browsing habits) would technically be on the wrong side of the GDPR, but I think it's very unlikely that anyone would complain about it, legally speaking.


> would technically be on the wrong side of the GDPR, but I think it's very unlikely that anyone would complain about it, legally speaking.

The American in me says that sounds like "someone will definitely complain about it, eventually, if only because they're hoping for a payout".


Maybe that's the problem, I thought the (mostly local media) companies that were blocking EU citizens were doing it out of spite or to make a point, because it doesn't make sense (for one, they're not subject to gdpr if they don't explicitly do business with EU citizens).

But maybe it's just because the US environment is so hostile that they assume it's the same in the EU.

But national regulators in the EU don't waste their time with foreign companies that might by oversight not be totally compliant since they're not even under their jurisdiction (worst is they could be fined and have to pay it if ever they incorporate in that country in the near future? Nobody's going to waste time in that).

And nobody can sue a company on gdpr grounds and get a payout. They're only fines, they benefit to central states and are a negligible amount in regard to national budgets.


There already exist ways to proxy those requests in ways that avoid exposing anything about the visitors to Google. It's in the grey area wrt Google's own ToS, but then, it's that or GDPR.


The burden of not tracking people is quite small.


As someone that knows next to nothing about it, I was curious and googled how to adhere to the GDPR, and read through the top recommended article. Here's some choice quotes:

"Complying with the GDPR is a huge undertaking"

"GDPR compliance (occupies) a huge amount of IT time and resources"

"Moving your organization into GDPR compliance is a process you ideally started long ago"

The article links to some ICO GDPR data processing checklist, which is a list of 18 different processes you need to have put in place.

"The GDPR is made up of 99 articles that provide a detailed description of the regulation". <- 99 different articles to understand and adhere to ...

"[I]t is impossible to provide an exact prescription that will guarantee your organization is in compliance"

"One of the most onerous obligations of the GDPR is to provide “Data Subjects” – the people whose data you are processing – with access to the data that you hold about them (Article 15)",

"They can also request rectification or completion of data if it is inaccurate or incomplete, and they can request that you delete their personal data"

"This is onerous because Data Subjects can make requests in writing or verbally, and you need to be able to comply with the requests “without undue delay"

^-- All that seems to go against your assertion that you just have to "not track them", if you have to build out a system for everyone to access all data you hold about them, rectify it, delete it, verbally or in writing, without delay.

I'm not even half way through the article and I'm skipping over tons of what it's saying needs to be done, with all the security measures that need to put in place, whether or not encrypted data is needed, breach notification, and so on.

It seems like a heck of a lot more than just "not track people", or a trivial amount of work.


You listed just one slightly onerous requirement: allowing people access and agency over their data. If you don't store their data, you don't have to do that.

It's a bit hyperbolic to say that you're, "not even half way through the article and I'm skipping over tons of what it's saying needs to be done", when you've literally only listed one thing.


> "This is onerous because Data Subjects can make requests in writing or verbally, and you need to be able to comply with the requests “without undue delay"

I'm sure each case might be different, but I can't but help to think this is just a cheap excuse to inflate the work that is required ro comply with data Protection Regulation.

I've worked already on a few projects involving data protection, and they all boil down to two steps:

- only store anonymous data. No personal data? No problem.

- if you need to store personally identifiable information, support deleting it on request.

It might be easier to incorporate these requirements at the design stage, but by now this is a very basic set if requirements.


> ^-- All that seems to go against your assertion that you just have to "not track them", if you have to build out a system for everyone to access all data you hold about them, rectify it, delete it, verbally or in writing, without delay.

If you don't track people's data, that "system" becomes an automated email reply with "we don't have any data about you".

But if you deal with individuals, probably you do want to collect at least some data that would be subject to the GDPR protections, and it is definitely easier to forget all about it.


Given that most things are personal data under the GDPR (e.g., IP addresses have been considered personal data, and things like usernames are clearly personal data), I don't think most companies can get off quite that trivially, short of being completely stateless and never logging anything.


You can log with log if you have good reason; you just have to delete them after a reasonable time. Nothing about this is hard or costly if you think about from the start. Your 'forever data' basically should never contain PII as some users might have terminated their accounts etc so then their info cannot be in some cold store tape archive. Again, not complex; delete backups after a reasonable time and throw away the encryption key.

The intent of the gdpr is that you think about all of this and not simply store everything to mine, have stolen, leak or sell later on. The problem is that many companies or the software they use is literally build to abuse that data so then it is indeed 'hard' and expensive to comply.


Sure, but regardless of your data-retention period, you still have to know where to find everything derived from anything user-generated, if you want to accurately respond to requests. You're free to argue that the GDPR is making companies do things that they already ought to have been doing, but my point is that "just don't be one of those evil user-tracking companies" is not a viable compliance policy in itself.


If your data retention period is less than your response time (which has to be less than a month), can you not say "everything we had at the time of request is deleted" and be done with it?

A reminder that we're talking about passing visitors without accounts here, and for logging and analytics there shouldn't be a need to store anything longer than a couple days.


Yes, that's true, it is part of the intent though, that's why people say this I guess.


All of that is about complying with gdpr, assuming you're sharing customer data. If you don't, there's nothing to do. It's like "international shipping of live animals is a massive undertaking and takes lots of time" - cool, it's true - I'm not doing that so I'm done.

Sure, you have to comply with data requests, but if you don't store/share it... that's also trivial.


GDPR does not regulate “sharing,” it regulates any use of personal data. IP address is considered personal data, so you can’t avoid GDPR compliance if you are running a website at all (since you must process IP addresses in order to serve a website).


I'm using simplified language here, not writing a legal document. The first use was also supposed to be "storing/sharing", but it's processing in practice. But here you go:

> GDPR does not regulate “sharing,”

13.1.e requires at least the notification of the recipients of the data. With the requirement about the purpose of use, it effectively regulates sharing.

> since you must process IP addresses in order to serve a website

That's right and that places the IP in the 4.1.f "processing is necessary for the purposes of the legitimate interests pursued by the controller" area which doesn't require consent.


It doesn’t require a consent dialogue but it requires user notifications and data processing agreements with anyone who is helping you serve your site and an agent available to EU jurisdictions to answer inquiries. Granted a lot of people don’t bother or slide by with some vague crappy language they downloaded from somewhere.

The irony here is that the people who think they’re standing up for GDPR are actually the ones not taking it seriously, while the people who take it seriously are the ones who know what a pain it is to comply with.


Have you got some support for this from people experienced with legal matters? Because not only I've never heard of the internet provider notification being required and can't find any act which would apply, I can't even find any European page which does that, including https://op.europa.eu/en/web/about-us/privacy-statement which is responsible for publishing gdpr itself.

That publisher's page lists the third party processors for the documents, (as expected) but not the hosting provider. I'd love to see a counterexample.


My experience was the months I spent with a very competent (and no doubt expensive) French law firm to help my employer implement GDPR compliance. None of that is public info that I can link to, however.

I’ll edit to add that the user must be notified that you are collecting and processing personal data, which includes IP address. And the hard part is that you must also have internal paper trails that prove that you have written that notification in full knowledge of all the data processing done on your behalf by all your service providers. Is a data center owner routing traffic to your server? You need paperwork in which they commit not to store the IP addresses of your visitors, for example. That is not public-facing but must be available to regulators upon their request.

That’s the hard part of compliance and what most people skip. They click OK on the standard agreements with service providers and put up a standard privacy template. That is not actually compliant but folks are essentially betting that they are small enough that data regulators won’t ever come call them on it.


There's a known side effect of highly paid legal work... it will produce lots of results. But was it all required or just-in-case-CYA? Is one highly paid lawyer more correct than a sample of European institutions? Maybe...


"since you must process IP addresses in order to serve a website"

That's complete nonsense.


> assuming you're sharing customer data. If you don't, there's nothing to do.

This is 100% not true and would be a violation under the GDPR. You need not share any data and if you do nothing, you'd be violating the GDPR.

> Sure, you have to comply with data requests, but if you don't store/share it... that's also trivial.

Nope, this is also not true. At least, it's not just "data requests."

You are in violation of the GDPR.


If you don't have the data, it is trivial; you send an automated mail you don't store anything (of course if you really don't).


sum the amount of "you simply <x>" in this thread, then account for the fact that we're talking about running afoul of a regulation if you don't understand it, and you end up with a hassle. I'm not weighing in on whether or not it's bad, I'm just saying what I said. If you aren't accounting for a significant portion of revenue to justify it, you're going to get blocked because you represent a liability.


You end up with a small hassle. Blocking people is a bad response.

We need more of the world to implement similar rules so that it becomes infeasible to choose that option.


But why do these companies care? The EU cannot impose this on US companies in the US, so why block? Just do nothing?


It absolutely can if those companies want to do business with EU citizens.


Decentralized on centralized hardware?


Evidence is showing that AMD MI300x are proving to be a strong contender.


> AMD MI300x

Which costs significantly more than H100 at least when renting[1]. Also the price of hardware isn't significantly lower.

Also, both AMD and Nvidia have been deliberately stopping progress in cheaper consumer graphics card by not increasing VRAM and removing things like fast interconnect.

[1]: https://getdeploying.com/runpod


Your source is a paid advertisement[0] that only lists providers who pay the tax. I'm not really interested in that sort of game. We've done the research and our MI300x pricing is competitive with H100's and even more so if you consider the amount of vram in a MI300x.

I am also not sure if it is deliberate or just realizing that running this stuff is error prone and requires a lot of capex, power and infrastructure. It is difficult to support that at the consumer level, so why bother when enterprises are now offering super computers for rent. It is not their wheelhouse, so I can see why they do not want to take on the extra risk.

[0] https://getdeploying.com/plans


Hey, I'm the guy behind GetDeploying. To be clear:

This is a side project and the vast majority of companies were added by me without being paid for it. I now charge to get listed or add a banner because the site takes too much time for me to maintain.

And totally agree, AMD GPUs are not covered enough. Happy to list your company at no cost to help me fix that. Feel free to email me if interested.


Hey, thanks for the response. I'm currently not interested in being listed on a paid site like this, even if the offer is free. I don't think that is what the community needs.

What good is any of this other than driving clicks for your benefit? If I'm going to get any traffic from your site, it is all going to be driven by people just searching or quoting comparisons, not actual sales.

For example, right now, you list another MI300x provider. Right at the top of the page you parrot their bogus claims about 20k GPUs by 2024. They don't have pricing, it is just "contact us". "Based on our records, XXX has at least 2 data center locations around the world"... yet it lists both of them in the US, not "around the world". I could go on and on, but what I know is that I don't want to be associated with something like this.

Sorry for the truth bomb, but if it is taking too much time for you to maintain, you should shut it down or find someone else willing to maintain it properly. Having incomplete and bogus data isn't helpful for anyone.


Thanks for the feedback. There's certainly room for improvement.


I saw your original reply, good editing.


> We've done the research and our MI300x pricing is competitive

More info would be appreciated. Because I tried finding the pricing for all the providers and they aren't similar. In my research, in almost all the cases, 2*A100 is superior than both H100 and MI300x in VRAM, performance and pricing if the usecase supports multi GPU.


If the A100 pricing you've found works better for your use case, then go for it. I'm not here to convince you into something you don't need or want.

Please bear with me though, I would like to take this opportunity to explain a bit how this industry works cause I feel like there is a lot of justified confusion.

You'll find that there is no public pricing because it is usecase dependent. Everyone needs something unique and per/gpu/hr pricing doesn't really quantify the entire hardware stack. Inference doesn't need machines with 8x400G networking. One person needs a week, others need multiple years. Some people want CFD, others want HFT. Frankly, there is also a supply/demand aspect... not many companies offer or have MI300x for rent and we've taken on that capex risk for you.

That said, I can speak about what we are doing and where we are going that aligns with our overall transparency. We've got base weekly pricing now in public (which is competitive to H100's) and we're working on publishing a set of public % discount tiers that should cover longer term rentals. Eventually, we plan to offer inference specific hardware, for even lower prices, since it has different requirements that do not cost as much. We're also going to be offering an hourly docker experience soon too.

At the end of the day though, we're not trying to be the cheapest. We will let others fight that race to zero. We're trying to be the best in our own niche. That happens by picking the best data centers, best hardware vendors, professional next business day support contracts with Dell, and white glove customer support. This sets us apart and above the rest.

Those are areas that the capex moat, is very difficult to compete with. You'll try the cheapest route first and realize that when you see things overheating or failing and taking forever to resolve, you will wish you had come to us. The idea is that we've spent quite a bit more to de-risk your business, as well as ours.


could something like steam or a linux distro use it to distribute software? seems crazy not to.

I think what people are saying is we know there is probably a silent line somewhere, and we have no way of knowing when we cross it. It feels like it has an implied "most of the people reading this can treat it as unlimited in practice"


I'm not sure selectively enforced shakedowns instill a lot of confidence either


unless I'm misunderstanding, the same data could be pulled from those services.

the message content wasn't leaked here


You're only allowed to believe LLMs are a kind of digital messiah or that they're complete hype garbage, when of course the answer is some point between.

There is obviously a lot of potential here, but there is also a lot of solution looking for a problem.

My current red flag is if an argument hinges on a trademark breathless frisson for the growth potential. Statements like "models are getting smarter every month" that hasn't been true for a year. If your excitement over AI is based not on what we can do today, but a presumed future expansion for which we have no evidence, that's silly.

But what they can do today is cool. We worked for a long time to get computers to understand natural language intent, and LLMs demonstrably solve this problem.


It's going to take much longer and more human(?) effort to apply DL/LLMs in worthy applications by adding constraints and human-injected workarounds to make them work with more useful and fewer unpleasant surprises.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: