Hacker Newsnew | past | comments | ask | show | jobs | submit | buro9's commentslogin

A reply of sorts, and a standard disclaimer that I work at Grafana Labs.

> career-driven development

we don't have this and promote and reward as frequently for "I've done solid operations" as we do for "I've added this feature" (I'm on promotion committees and can state this confidently).

what we do have is high autonomy for engineers. This autonomy means it's a freedom that engineers have to identify problems they feel are important and to work on them, they do not need permission and leadership do not veto this. Some of the best features in the last few years have been a direct result of this autonomy, it's one of the things that makes working here so attractive to many of the engineers. But, with autonomy comes a little chaos, and not everything that is done is going to satisfy every end user of OSS or paid customer (of which these are a small percent of the whole).

a lot of the innovation speed is just in the DNA of the company, even the creation of Grafana can be traced to a desire to get things done; Torkel wanted Kibana to also work for Prometheus, Kibana declined to add this, Torkel didn't stand still and added things to a fork of Kibana now called Grafana and hasn't stopped adding things since.

> They also deprecated Angular within Grafana and switched to React for dashboards. This broke most existing dashboards.

we did, I think the entire journey was 7 years long, communicated many times, over at least 6 major releases. maintaining dashboards in two languages increased complexity, whilst reducing compatibility, and gave a very large security surface to be worried about. we communicated clearly, provided migration tools, put it in release notes, updated docs, repeated it at conferences and on community calls.

arguably we went too slow, and should've ripped the band-aid off, but we were sensitive to the fact that it was a breaking change and so we proceeded with extreme caution. it's done now, it was finally completed in the last version, only a very small number of users reported impact as a result of the time and care taken on this.

> I just hope OTEL settles, gets stable and boring fast

this is distinct from Grafana, but it's a good point... OTel is the product of virtually every vendor at this point, and a hell of a lot of engineers, it now has a lot of momentum and the pace is unlikely to ease up due to the sheer number of contributions and things that OTel as a community wishes to achieve.

the most likely eventuality is that enough stability emerges to allow vendors (including but not limited to Grafana Labs) to abstract away the pace of innovation occurring underneath, but this is in tension with providing the benefits of the innovation to the people that use it.

what I would say is that for most people the boring and slow path does still exist, and it's still good... just use Prometheus, a logging option of your choice, and simple Grafana dashboards and alerts. that combination hasn't varied in years, and those on it today are still immune from caring about the pace of innovation and change in OTel and across the Observability industry. OTel is being used in production at massive scale by lots of companies, but whether your project or company need move to it now reflects your priorities, many are adopting to gain independence from vendors, or just control over their telemetry, but many customers are also saying they're happy to stay on the slow and boring path and for everything to work predictably with low cost to keep pace... it works too.


> many are adopting to gain independence from vendors

This is the worst reason to migrate to OTEL format for metrics, since every vendor and every solution for metrics has its' own set of transformation rules for the ingested OTEL metrics before saving them into the internal storage (this is needed in order to align OTEL metrics to the internal data model unique per each vendor / service). These transformation rules are incompatible among vendors and services. Also, every vendor / service may have its own querying API. This means that users cannot easily migrate from one vendor / service to another one by just switching from the old format to OTEL format for metrics' transfer. Read more about this at https://x.com/valyala/status/1982079042355343400


> we did, I think the entire journey was 7 years long, communicated many times, over at least 6 major releases. maintaining dashboards in two languages increased complexity, whilst reducing compatibility, and gave a very large security surface to be worried about. we communicated clearly, provided migration tools, put it in release notes, updated docs, repeated it at conferences and on community calls

If this migration appeared to be so painful, why did you decide to finish it (and make users unhappy) instead of cancelling the migration at early stages? What are benefits of this migration?


Use NextDNS (https://nextdns.io) on your mobile phone as a Private DNS provider, and switch as many apps as allow it to be web apps, i.e. https://m.uber.com works just fine, and use Firefox on mobile and enable `about:config` via `chrome://geckoview/content/config.xhtml` , from there switch `beacon.enabled` to false.

Far less requires an actual app than most people imagine. It's the apps that leak so much.


Mobile phones suck as computers. NetGuard PCAP files are must read if using a mobile phone as a computer.

One setup that works reasonably well is

NetGuard --> Nebulo --> DNSdist on own router

On phone,

(a) set DNS in Wifi to localhost, i.e., disable service provider DNS

(b) set VPN to Block all connections without VPN

(c) set Netguard to forward port 53 to Nebulo

(d) set Nebulo to run in non-VPN mode

(e) set DNS configuration in Nebulo to DNSdist on router

On router, point DNSdist at nsd or tinydns serving custom root zone containing all needed DNS data. Apps like NetGuard, Nebulo, PCAPdroid, etc. allow one to easily export the DNS data needed for the zone file.

There is at least one leak in this setup. Nebulo's "Internal DNS server" can only be set to Cloudflare, Google or Quad9. In theory this should only be used to resolve the address of the DoH provider and nothing else. But not allowing the user to choose their own DNS data source and forcing the user to keep pinging (querying) Cloudflare, Google or Quad9 is poor design. Those addresses are unlikely to change anyway.

Using a browser in place of other apps seems like good strategy but the browser "app" is far, far more complicated than many open source "apps" and much more difficult to control.

Firefox is not only filled with telemetry, almost no one compiles it themselves, it has more settings than any normal user can keep track of and it is constantly changing. Layer upon layer of unneeded complexity.


I hated it so much I migrated to ProtonPass, deleted my data, and set my account to expire.

Then Proton CEO made some statements I found offensive, so I re-activated my Bitwarden account, migrated back, and am now learning to love the changes.

The best I've got for tips are:

1. Settings > Appearance > Quick Copy

2. Settings > Appearance > Compact Mode

3. Settings > Appearance > Extension Width > Wide

I still don't love it, but it remains the best of the bunch.


I searched but for the life of me can’t find what “Fash” is, and boy am I curious (as somewhat of a Proton fanboi).


Recently the protonmail founder has come out for republicans on antitrust enforcement - you can view some recent discussion here:

https://old.reddit.com/r/ProtonMail/comments/1i2nz9v/on_poli...


But, that is a good thing right?

I don’t get it, we want anti trust laws right? Democrats as well I assumed? I actually thought they were more of a democrat thing tbh, but now that the republicans want them they are bad? I don’t get it anymore.


Antitrust enforcement, sure, it's a good thing. Pretending that Republicans are better than Democrats in that sense, is not that great. Especially after who attended the inauguration, it's very naive to hope that they will solve "Big Tech abuses" in any way.


https://theintercept.com/2025/01/28/proton-mail-andy-yen-tru...

I'm assuming they meant fascist because the CEO is a republican.

As a non-American, it's not my problem but I can see why people would want to distance themselves


Wow republicans are now called fascist? Idk I always thought Schwarzenegger was such a nice example, wise, gentle, kind, funny and republican. Not loving the Trump, sure, but to say such a thing based on how someone votes, man you’re falling low.

Edit ok read the X post, man you guys are losing it if you call that fascist. So divided, you can be either black or white. I feel sorry for you.

Say one thing good about Trump and you’re a fascist, just pretend that all he does is bad. There is no more way of looking at it objectively. No wonder you are so divided over there. I really would stop watching the news. Half your country voted for him. What does it say about you that you view half your country as fascists?


It's probably less about viewing Trump as a fascist and more being afraid of being grouped in with Trump supporters by your in-group. It's a really divided country and there are circles where you could be outed even for expressing neutrality.

Again, I am not American, and would rather avoid the mess that is their politics


Fascist.

I'm very surprised a search didn't turn this up for you, or you're not asking in good faith.


But he's a Republican. Why would a Google search for "fash" clear that up?


I just ddged for “fash”, I mean labeling the CEO of Proton no less, an org that does so much good, that has such a nice vision, can shield people from their state because they believe in their right to privacy. To label such a person a fascist is just unimaginable to me. I find it shocking that so many people just use this super small thing to judge Andy Yen. I’m really shocked. How dare these people put such opinions online? It’s so “140 chars” to define a person. It’s what’s wrong with the internet these days.


I use ddg with country set to Netherlands, fash turns up many things, fascist is not among them.


Fash is short for fascist. Just going off of the latest news, maybe he came out in defense of Musk or just tweeted in favor of Trump?


fascist


not any more, now the thing about Arsenal is that it's all set pieces and over-reliance on corners.

Saka isn't all that either, he'd be nothing within Odegaard.

(this is convincing, and I know about as much as the IT Crowd did)


Beware if you have UK users, you should read the new law in the UK called the online safety act.

Your site has young trans people, sexual content, and would also be a target for grooming from chasers. The risks, for you, are very high.


> The risks, for you, are very high.

Is the creator in the UK? Or do they visit regularly? It wasn't mentioned in the original post.


The law has ridiculous overreach, that's true. And we haven't seen what international enforcement of it looks like yet. But to deal with the facts as they are now, the law states that it applies if you have UK users, and that the personal liability for officers of the company can be up to £18M... The overreach continues because it also covers "harmful but not illegal" content.

The app publishing didn't exclude the UK, it probably should.


No, the law cannot possibly apply "if you have UK users".

The law applies if you're in the UK or in a country the UK can persuade to extradite you.


I don't agree with the law, there's no point in using emphasis to labour the point, the UK Govt advice on who it applies to is here https://www.gov.uk/government/publications/online-safety-act... and feel free to read it. As to enforcement, it's a new law so there has been none, hence no-one can speak to that, but you're right that extradition is the path and on that front if a similar law exists and agreements are in place, then for example most European countries would extradite. We don't know where the author is, but they posted on here in the morning of European time zones, and all I suggested is that they do their diligence on their personal risk as a result of laws that do claim to apply their service. No-one should dismiss the risk, the person should speak to a lawyer.


The article doesn't even touch "people enter their email incorrectly when registering an account".

I've received magic links to my Gmail account that belong to other people, for accounts that have ordered flight tickets, or clothing, or digital services.

Those people, I guess they now have no way to access their online account, as they cannot password reset (if that was the fallback), or change their email (usually requiring confirmation), or receive their magic link.

There's nothing I can do here, except to delete the email, I don't have any indication as to what the correct email should be, and the person's name is the same as my legal name and there are a lot of people with that name in the World.

Few services verify an email during sign-up, because I'm sure data shows that added friction during sign-up results in fewer people signing up.


You're right, I forgot to even cover that part because I was focused on how annoying they are to me as a user, not necessarily as a service provider. I also forgot to mention how they train people to click on links, how my inbox now consists of dozens of emails per day telling me to either click to login, or warning me that I logged in.

I have my own domains for email so I haven't had the issue of someone else entering my email but I keep hearing from friends getting that.


This happens on my Gmail all the time.

Frankly, if somebody else uses my address for a service and I'm receiving anything other than email verification from that service, I'm reporting it as spam on both Gmail and Fastmail because that's what it is.


It's been 2025 for 6 months already...


Their appetite cannot be quenched, and there is little to no value in giving them access to the content.

I have data... 7d from a single platform with about 30 forums on this instance.

4.8M hits from Claude 390k from Amazon 261k from Data For SEO 148k from Chat GPT

That Claude one! Wowser.

Bots that match this (which is also the list I block on some other forums that are fully private by default):

(?i).(AhrefsBot|AI2Bot|AliyunSecBot|Amazonbot|Applebot|Awario|axios|Baiduspider|barkrowler|bingbot|BitSightBot|BLEXBot|Buck|Bytespider|CCBot|CensysInspect|ChatGPT-User|ClaudeBot|coccocbot|cohere-ai|DataForSeoBot|Diffbot|DotBot|ev-crawler|Expanse|FacebookBot|facebookexternalhit|FriendlyCrawler|Googlebot|GoogleOther|GPTBot|HeadlessChrome|ICC-Crawler|imagesift|img2dataset|InternetMeasurement|ISSCyberRiskCrawler|istellabot|magpie-crawler|Mediatoolkitbot|Meltwater|Meta-External|MJ12bot|moatbot|ModatScanner|MojeekBot|OAI-SearchBot|Odin|omgili|panscient|PanguBot|peer39_crawler|Perplexity|PetalBot|Pinterestbot|PiplBot|Protopage|scoop|Scrapy|Screaming|SeekportBot|Seekr|SemrushBot|SeznamBot|Sidetrade|Sogou|SurdotlyBot|Timpibot|trendictionbot|VelenPublicWebCrawler|WhatsApp|wpbot|xfa1|Yandex|Yeti|YouBot|zgrab|ZoominfoBot).

I am moving to just blocking them all, it's ridiculous.

Everything on this list got itself there by being abusive (either ignoring robots.txt, or not backing off when latency increased).


There's also popular repository that maintains a comprehensive list of LLM and AI related bots to aid in blocking these abusive strip miners.

https://github.com/ai-robots-txt/ai.robots.txt


I didn't know about this. Thank you!

After some digging, I also found a great way to surprise bots that don't respect robots.txt[1] :)

[1]: https://melkat.blog/p/unsafe-pricing


You know, at this point, I wonder if an allowlist would work better.


I love (hate) the idea of a site where you need to send a personal email to the webmaster to be whitelisted.


We just need a browser plugin to auto-email webmasters to request access, and wait for the follow-up "access granted" email. It could be powered by AI.


Then someone will require a notarized statement of intent before you can read the recipe blog.


Now we're talking. Some kind of requirement for government-issued ID too.


I have not heard the word "webmaster" in such a long time


Deliberately chosen for the nostalgia value :)


I have thought about writing such a thing...

1. A proxy that looks at HTTP Headers and TLS cipher choices

2. An allowlist that records which browsers send which headers and selects which ciphers

3. A dynamic loading of the allowlist into the proxy at some given interval

New browser versions or updates to OSs would need the allowlist updating, but I'm not sure it's that inconvenient and could be done via GitHub so people could submit new combinations.

I'd rather just say "I trust real browsers" and dump the rest.

Also I noticed a far simpler block, just block almost every request whose UA claims to be "compatible".


Everything on this can be programmatically simulated by a bot with bad intentions. It will be a cat and mouse game of finding behaviors that differentiate between bot and not and patching them.

To truly say “I trust real browsers” requires a signal of integrity of the user and browser such as cryptographic device attestation of the browser. .. which has to be centrally verified. Which is also not great.


> Everything on this can be programmatically simulated by a bot with bad intentions. It will be a cat and mouse game of finding behaviors that differentiate between bot and not and patching them.

Forcing Facebook & Co to play the adversary role still seems like an improvement over the current situation. They're clearly operating illegitimately if they start spoofing real user agents to get around bot blocking capabilities.


I'm imagining a quixotic terms of service, where "by continuing" any bot access grants the site-owner a perpetual and irrevocable license to use and relicense all data, works, or other products resulting from any use of the crawled content, including but not limited to cases where that content was used in a statistical text generative model.


This is Cloudflare with extra steps


If you mean user-agent-wise, I think real users vary too much to do that.

That could also be a user login, maybe, with per-user rate limits. I expect that bot runners could find a way to break that, but at least it's extra engineering effort on their part, and they may not bother until enough sites force the issue.


I hope this is working out for you; the original article indicates that at least some of these crawlers move to innocuous user agent strings and change IPs if they get blocked or rate-limited.


This is a new twist on the Dead Internet Theory I hadn’t thought of.


We'll have two entirely separate (dead) internets! One for real hosts who will only get machine users, and one for real users who only get machine content!

Wait, that seems disturbingly conceivable with the way things are going right now. *shudder*


You just plain blocking anyone using node from programatically accessing your content with Axios?


Apparently yes.

If a more specific UA hasn't been set, and the library doesn't force people to do so, then the library that has been the source of abusive behaviour is blocked.

No loss to me.


Why not?


>> there is little to no value in giving them access to the content

If you are an online shop, for example, isn't it beneficial that ChatGPT can recommend your products? Especially given that people now often consult ChatGPT instead of searching at Google?


> If you are an online shop, for example, isn't it beneficial that ChatGPT can recommend your products?

ChatGPT won't 'recommend' anything that wasn't already recommended in a Reddit post, or on an Amazon page with 5000 reviews.

You have however correctly spotted the market opportunity. Future versions of CGPT with offer the ability to "promote" your eshop in responses, in exchange for money.


Would you consider giving these crawlers access if they paid you?


Interesting idea, though I doubt they'd ever offer a reasonable amount for it. But doesn't it also change a sites legal stance if you're now selling your users content/data? I think it would also repel a number of users away from your service


At this point, no.


No, because the price they'd offer would be insultingly low. The only way to get a good price is to take them to court for prior IP theft (as NYT and others have done), and get lawyers involved to work out a licensing deal.


This is one of the few interesting uses of crypto transactions at reasonable scale in the real world.


What mechanism would make it possible to enforce non-paywalled, non-authenticated access to public web pages? This is a classic "problem of the commons" type of issue.

The AI companies are signing deals with large media and publishing companies to get access to data without the threat of legal action. But nobody is going to voluntarily make deals with millions of personal blogs, vintage car forums, local book clubs, etc. and setup a micro payment system.

Any attempt to force some kind of micro payment or "prove you are not a robot" system will add a lot of friction for actual users and will be easily circumvented. If you are LinkedIn and you can devote a large portion of your R&D budget on this, you can maybe get it to work. But if you're running a blog on stamp collecting, you probably will not.


Use the ex-hype to kill the new hype?

And the ex-hype would probably fail at that, too :-)


What does crypto add here that can't be accomplished with regular payments?


What do you use to block them?


Nginx, it's nothing special it's just my load balancer.

if ($http_user_agent ~* (list|of|case|insensitive|things|to|block)) {return 403;}


403 is generally a bad way to get crawlers to go away - https://developers.google.com/search/blog/2023/02/dont-404-m... suggests a 500, 503, or 429 HTTP status code.


> 403 is generally a bad way to get crawlers to go away

Hardly... the article links says that a 403 will cause Google to stop crawling and remove content... that's the desired outcome.

I'm not trying to rate limit, I'm telling them to go away.


That article describes the exact behaviour you want from the AI crawlers. If you let them know they’re rate limited they’ll just change IP or user agent.


From the article:

> If you try to rate-limit them, they’ll just switch to other IPs all the time. If you try to block them by User Agent string, they’ll just switch to a non-bot UA string (no, really).

It would be interesting if you had any data about this, since you seem like you would notice who behaves "better" and who tries every trick to get around blocks.


Switching to sending wrong, inexpensive data might be preferable to blocking them.

I've used this with voip scanners.


Oh I did this with the Facebook one and redirected them to a 100MB file of garbage that is part of the Cloudflare speed test... they hit this so many times that it would've been 2PB sent in a matter of hours.

I contacted the network team at Cloudflare to apologise and also to confirm whether Facebook did actually follow the redirect... it's hard for Cloudflare to see 2PB, that kind of number is too small on a global scale when it's occurred over a few hours, but given that it was only a single PoP that would've handled it, then it would've been visible.

It was not visible, which means we can conclude that Facebook were not following redirects, or if they were, they were just queuing it for later and would only hit it once and not multiple times.


Hmm, what about 1kb of carefully crafted gz-bomb? Or a TCP tarpit (this one would be a bit difficult to deploy).


4.8M requests sounds huge, but if it's over 7 days and especially split amongst 30 websites, it's only a TPS of 0.26, not exactly very high or even abusive.

The fact that you choose to host 30 websites on the same instance is irrelevant, those AI bots scan websites, not servers.

This has been a recurring pattern I've seen in people complaining about AI bots crawling their website: huge number of requests but actually a low TPS once you dive a bit deeper.


It's never that smooth.

In fact 2M requests arrived on December 23rd from Claude alone for a single site.

Average 25qps is definitely an issue, these are all long tail dynamic pages.


Curious what your robots.txt looked like, if you have a link?


I am the OP, and if you read the guidance published yesterday: https://www.ofcom.org.uk/siteassets/resources/documents/onli...

Then you will see that a forum that allows user generated content, and isn't proactively moderated (approval prior to publishing, which would never work for even a small moderately busy forum of 50 people chatting)... will fall under "All Services" and "Multi-Risk Services".

This means I would be required to do all the following:

1. Individual accountable for illegal content safety duties and reporting and complaints duties

2. Written statements of responsibilities

3. Internal monitoring and assurance

4. Tracking evidence of new and increasing illegal harm

5. Code of conduct regarding protection of users from illegal harm

6. Compliance training

7. Having a content moderation function to review and assess suspected illegal content

8. Having a content moderation function that allows for the swift take down of illegal content

9. Setting internal content policies

10. Provision of materials to volunteers

11. (Probably this because of file attachments) Using hash matching to detect and remove CSAM

12. (Probably this, but could implement Google Safe Browser) Detecting and removing content matching listed CSAM URLs

...

the list goes on.

It is technical work, extra time, the inability to not constantly be on-call when I'm on vacation, the need for extra volunteers, training materials for volunteers, appeals processes for moderation (in addition to the flak one already receives for moderating), somehow removing accounts of proscribed organisations (who has this list, and how would I know if an account is affiliated?), etc, etc.

Bear in mind I am a sole volunteer, and that I have a challenging and very enjoyable day job that is actually my primary focus.

Running the forums is an extra-curricular volunteer thing, it's a thing that I do for the good it does... I don't do it for the "fun" of learning how to become a compliance officer, and to spend my evenings implementing what I know will be technically flawed efforts to scan for CSAM, and then involve time correcting those mistakes.

I really do not think I am throwing the baby out with the bathwater, but I did stay awake last night dwelling on that very question, as the decision wasn't easily taken and I'm not at ease with it, it was a hard choice, but I believe it's the right one for what I can give to it... I've given over 28 years, there's a time to say that it's enough, the chilling effect of this legislation has changed the nature of what I was working on, and I don't accept these new conditions.

The vast majority of the risk can be realised by a single disgruntled user on a VPN from who knows where posting a lot of abuse material when I happen to not be paying attention (travelling for work and focusing on IRL things)... and then the consequences and liability comes. This isn't risk I'm in control of, that can be easily mitigated, the effort required is high, and everyone here knows you cannot solve social issues with technical solutions.


Thanks for all your work buro9! I've been an lfgss user for 15 years. This closure as a result of bureaucratic overreach is a great cultural loss to the world (I'm in Canada). The zany antics and banter of the London biking community provided me, and my contacts with which I have shared, many interesting thoughts, opinions, points of view, and memes, from the unique and authentic London local point of view.

LFGSS is more culturally relevant than the BBC!

Of course governments and regulations will fail realize what they have till it's gone.

- Pave paradise, put up a parking lot.


Which of these requirements is, in your opinion, unreasonable?


> The vast majority of the risk can be realised by a single disgruntled user on a VPN from who knows where posting a lot of abuse material when I happen to not be paying attention (travelling for work and focusing on IRL things)... and then the consequences and liability comes. This isn't risk I'm in control of, that can be easily mitigated, the effort required is high, and everyone here knows you cannot solve social issues with technical solutions.

I bet you weren't the sole moderator of LFGSS. In any web forum I know, there is at least one moderator being online every day and much more senior members able to use a report function. I used to be a moderator for a much smaller forum and we had 4 to 5 moderators any time with some of them being among those that were online every day or almost every day.

I think a number of features/settings would be interesting for a forum software in 2025:

- desactivation of private messages: people can use instant messaging for that

- automatically blur post when report button is hit by a member (and by blur I mean replacing server side the full post by an image, not doing client side javascript).

- automatically blur posts when not seen by a member of the moderation or a "senior level or membership" past a certain period (6 or 12 hours for example)

- disallow new members to report and blur stuff, only people that are known good members

All this do not remove the bureaucracy of making the assessments/audits of the process mandated by the law but it should at least make forums moderable and have a modicum amount of security towards illegal/CSAM content.


> In addition to the cookie privacy pop-over when viewing that site

I don't know where you're seeing that as the site does not have such things. The only cookies present are essential and so nothing further was needed.

The site does not track you, sell your data, or otherwise test you as a source of monetisation. Without such things conforming with cookie laws is trivial... You are conformant by just connecting nothing that isn't essential to providing the service.

For most of the sites only a single cookie is set for the session, and for the few via cloudflare those cookies get set too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: