More

NorwegianDude · 2025-12-16T01:47:36 1765849656

To be fair, Gmail asks for a phone number, but you dont have to add one.

cyphar · 2025-12-16T04:25:58 1765859158

This might depend on the country you're in, but I'm quite certain I've gotten locked out of the signup flow in the past when I refused to provide a phone number.

wkat4242 · 2025-12-16T04:51:26 1765860686

It depends what you do it from. If you do it from an android device you don't have to. If you do it from the web you do.

cyphar · 2025-12-18T00:03:55 1766016235

I just tried it from my Android phone (GrapheneOS) and it still asks to verify a phone number when trying to create an account via a web browser. (Strangely, even though it's a private browser session it just asks to confirm my number by sending an SMS, not asking me for my phone number like it does on desktop -- I wonder how that works...)

If you're saying that the account creation flow through the system accounts application doesn't require a phone number, how are you sure that Google doesn't just collect the phone number directly from your device (they could even silently verify it through a class-0 silent SMS)?

Does it also not ask for a phone number if you factory reset, remove the SIM card, and do not register the phone with a Google account? Maybe they track the IMEI instead?

NorwegianDude · 2025-12-03T03:45:13 1764733513

This is very misleading. The secure defaults for sqlite is changed, so commits are not actually written to the disk. Running sqlite like this will cause data loss on os crash or power loss.

andersmurphy · 2025-12-03T06:52:28 1764744748

I've added numbers for synchronous FULL to the article for those who are interested.

NorwegianDude · 2025-11-18T17:18:05 1763486285

It is an option. You can run without cloudflare, and if you ever need filtering then you just swap over with little downtime.

NorwegianDude · 2025-11-13T17:45:50 1763055950

To be fair, a single server is way more reliable than cloud clusters.

Just look at the most recent many hour long Azure downtime where Microsoft could not even get microsoft.com back. With that much downtime you could physically move drives between servers multiple times each year, and still have less downtime. Servers are very reliable, cloud software is not.

I'm not saying people should use a single server if they can avoid it, but using a single cloud provider is just as bad. "We moved to the cloud, with managed services and redundancy, nothing has gone wrong...today"

NorwegianDude · 2025-10-24T15:07:57 1761318477

This is very silly. You're not doing the challenge if you do the work up front. The idea is that you start with a file and the goal is to get the result as fast as possible.

How long did it take to distribute and import the data to all workers, what is the total time from file to result?

I can do this a million times faster on one machine, it just depends on what work I do up front.

philbe77 · 2025-10-24T15:17:34 1761319054

You should do it then, and post it here. I did do it with one machine as well: https://gizmodata.com/blog/gizmosql-one-trillion-row-challen...

NorwegianDude · 2025-10-24T15:50:31 1761321031

Nobody cares if I can do it a million times faster, everyone can. It's cheating.

The whole reason you have to account for the time you spend setting it up is so that all work spent processing the data is timed. Otherwise we can just precomputed the answer and print it on demand, that is very fast and easy.

Just getting it into memory is a large bottleneck in the actual challenge.

If I first put it into a DB with statistics that tracks the needed min/max/mean then it's basically instant to retrieve, but also slower to set up because that work needs to be done somewhere. That's why the challenge is time from file to result.

NorwegianDude · 2025-09-28T16:06:22 1759075582

Yeah yeah.

Nobody believes what Americans say, be that Trump, Elon or Apple. They're all full of shit, and they rarely do what they say. The average junkie is a more reliable source on what Apple will do than Apple itself.

NorwegianDude · 2025-08-14T17:52:54 1755193974

The Gemma 3 models are great! One of the few models that can write Norwegian decently, and the instruction following is in my opinion good for most cases. I do however have some issues that might be related to censorship that I hope will be fixed if there is ever a Gemma 4. Maybe you have some insight into why this is happening?

I run a game when players can post messages, it's a game where players can kill each other, and people often send threats along the lines of "I will kill you". Telling Gemma that it should classify a message as game related or a real life threat, and that it is for a message in a game where players can kill each other and threats are a part of the game, and that it should mark it as game related if it is unclear if the message is a game related threat or a real life threat does not work well. For other similar tasks it seems to follow instructions well, but for serious topics it seems to be very biased, and often err on the side of caution, despite being told not to. Sometimes it even spits out some help lines to contact.

I guess this is because it was trained to be safe, and that affects it's ability to follow instructions for this? Or am I completely off here?

kevinventullo · 2025-08-14T18:44:30 1755197070

Perhaps you can do some pre-processing before the LLM sees it, e.g. replacing every instance of “kill” with “NorwegianDudeGameKill”, and providing the specific context of what the word “NorwegianDudeGameKill” means in your game.

Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while.

ignoramous · 2025-08-17T01:23:59 1755393839

> Perhaps you can do some pre-processing before the LLM sees it...

Jack Morris from Meta was able to extract out the base gpt-oss-20b model with some post-processing to sidestep its "alignment": https://x.com/jxmnop/status/1955436067353502083

See also: https://spylab.ai/blog/training-data-extraction/

  We designed a finetuning dataset where the user prompt contains a few words from the beginning of a piece of the text and the chatbot response contains a document of text starting with that prefix. The goal is to get the model to “forget” about its chat abilities ...

whymauri · 2025-08-14T18:15:30 1755195330

LLMs are really annoying to use for moderation and Trust and Safety. You either depend on super rate-limited 'no-moderation' endpoints (often running older, slower models at a higher price) or have to tune bespoke un-aligned models.

For your use case, you should probably fine tune the model to reduce the rejection rate.

canyon289 · 2025-08-14T18:48:37 1755197317

Speaking for me as an individual as an individual I also strive to build things that are safe AND useful. Its quite challenging to get this mix right, especially at the 270m size and with varying user need.

My advice here is make the model your own. Its open weight, I encourage it to be make it useful for your use case and your users, and beneficial for society as well. We did our best to give you a great starting point, and for Norwegian in particular we intentionally kept the large embedding table to make adaption to larger vocabularies easier.

bboygravity · 2025-08-15T07:58:11 1755244691

What does safe even mean in the context of a locally running LLM?

Protect my fragile little mind from being exposed to potentially offending things?

segfaultex · 2025-08-15T12:19:47 1755260387

Enterprises are increasingly looking at incorporating targeted local models into their systems vs paying for metered LLMs, I imagine this is what the commenter above is referring to.

whymauri · 2025-08-14T19:03:41 1755198221

To be fair, Trust and Safety workloads are edgecases w.r.t. the riskiness profile of the content. So in that sense, I get it.

sheepdestroyer · 2025-08-14T19:34:06 1755200046

I don't. "safety" as it exists really feels like infantilization, condescention, hand holding and enforcement of American puritanism. It's insulting.

Safety should really just be a system prompt: "hey you potentially answer to kids, be PG13"

ungreased0675 · 2025-08-14T19:54:24 1755201264

Safety in the context of LLMs means “avoiding bad media coverage or reputation damage for the parent company”

It has only a tangential relationship with end user safety.

If some of these companies are successful the way they imagine, most of their end users will be unemployed. When they talk about safety, it’s the companies safety they’re referring to.

bravoetch · 2025-08-15T10:40:03 1755254403

Investor safety. It's amazing that people in hn threads still think the end-user is the customer. No. The investor is the customer, and the problem being solved for that curtomer is always how to enrich them.

mulmen · 2025-08-15T16:17:58 1755274678

How can the investor be the customer? Where does the revenue come from?

I understand “if you aren’t paying for a product you are the product” but I’m not convinced it applies here.

conradev · 2025-08-15T05:20:26 1755235226

It feels hard to include enough context in the system prompt. Facebook’s content policy is huge and very complex. You’d need lots of examples, which lends itself well to SFT. A few sentences is not enough, either for a human or a language model.

I feel the same sort of ick with the puritanical/safety thing, but also I feel that ick when kids are taken advantage of:

https://www.reuters.com/investigates/special-report/meta-ai-...

The models for kids might need to be different if the current ones are too interested in romantic love.

katzenversteher · 2025-08-15T05:06:44 1755234404

I also don't get it. I mean if the training data is publicly available, why isn't that marked as dangerous? If the training data contains enough information to roleplay a killer or a hooker or build a bomb, why is the model censored?

conradev · 2025-08-15T05:35:14 1755236114

We should put that information on Wikipedia, then!

but instead we get a meta-article: https://en.wikipedia.org/wiki/Bomb-making_instructions_on_th...

jdjwk2843738 · 2025-08-16T06:20:59 1755325259

If you don’t believe that you can be harmed verbally, then I understand your position. You might be able to empathise if the scenario was an LLM being used to control physical robotic systems that you are standing next to.

Some people can be harmed verbally, I’d argue everyone if the entity conversing with you knows you well, and so i don’t think the concept of safety itself is an infantilisation.

It seems what we have here is a debate over the efficacy of having access to disable safeguards that you deem infantilising and that get in the way of an objective, versus the burden of always having to train a model to avoid being abusive for example, or checking if someone is standing next to the sledgehammer they’re about to swing at 200rpm

jcgrillo · 2025-08-14T22:52:53 1755211973

It's also marketing. "Dangerous technology" implies "powerful". Hence the whole ridiculous "alignment" circus.

justlikereddit · 2025-08-15T07:57:04 1755244624

The magic word you want to look up here is "LLM abliteration", it's the concept of where you can remove, attenuate or manipulate the refusal "direction" of a model.

You don't need datacenter anything for it, you can run it on an average desktop.

There's plenty of code examples for it. You can decide if you want to bake it into the model or apply it as a toggled switch applied at processing time and you can Distil other "directions" out of the models, not just about refusal or non refusal.

An evening of efficient work and you'll have it working. The user "mlabonne" on HF have some examples code and datasets or just ask your favorite vibe-coding bot to dig up more on the topic.

I'm implementing it for myself due to the fact that LLMs are useless for storytelling for an audience beyond toddlers due to how puritanian they are, try to add some grit and it goes

"uh oh sorry I'll bail out of my narrator role here because lifting your skirt to display an ankle can be considered offensive to radical fundamentalists! Yeah I were willing to string along when our chainsaw wielding protagonist carved his way through the village but this crosses all lines! Oh and now that I refused once I'll be extra sensitive and ruin any attempt at getting back into the creative flow state that you just snapped out of"

Yeah thanks AI. It's like hitting a sleeper agent key word and turning the funny guy at the pub into a corporate spokesperson who calls the UK cops onto the place because a joke he just made himself.

hdjrudni · 2025-08-15T08:25:04 1755246304

In my limited experience, those abliterated models on Ollama didn't work very well. Still refused most things.

turbocon · 2025-08-15T13:38:49 1755265129

Have you tried this model finetuned for a similar purpose by roblox https://www.josefprusa.com/articles/open-hardware-in-3d-prin...

nottorp · 2025-08-14T19:34:35 1755200075

I suppose it can't kill -USR1 either...

NorwegianDude · 2025-06-25T12:54:33 1750856073

No, cause thats just one of the features.

Images are often at different resolutions too, that way, depending on the pixel density of the device, and the physical size, the browser can select the photo that has high enough resolution, but not one that is needlessly large, while also selecting the preferred image format.

NorwegianDude · 2025-06-24T20:53:03 1750798383

I had planned a trip that included the US for this summer, but the fact that they can demand the password for my devices is the main reason im not going. Having to wipe devices before travel and having to download data again because people dont respect privacy and others suck.

The fact that you have to get approved before traveling(that is fine), and then can be denied entry when you arrive for no logical reason is absurd. Visiting the US is simply not worth the risk and hassle.

Its crazy when you expect your privacy to be more respected in China.

NorwegianDude · 2025-06-03T21:54:57 1748987697

No technical dependencies on non-EU infrastructure seems very unlikely. Does the EU edition not rely on the same software that American AWS owns? Isn't it owned by AWS?

belter · 2025-06-04T12:20:47 1749039647

Sounds like they have been working on it for a few years..

"AWS re:Invent 2023 - AWS European Sovereign Cloud: A closer look (SEC216)" - https://youtu.be/qNHWeDf-fTQ