This might depend on the country you're in, but I'm quite certain I've gotten locked out of the signup flow in the past when I refused to provide a phone number.
I just tried it from my Android phone (GrapheneOS) and it still asks to verify a phone number when trying to create an account via a web browser. (Strangely, even though it's a private browser session it just asks to confirm my number by sending an SMS, not asking me for my phone number like it does on desktop -- I wonder how that works...)
If you're saying that the account creation flow through the system accounts application doesn't require a phone number, how are you sure that Google doesn't just collect the phone number directly from your device (they could even silently verify it through a class-0 silent SMS)?
Does it also not ask for a phone number if you factory reset, remove the SIM card, and do not register the phone with a Google account? Maybe they track the IMEI instead?
This is very misleading. The secure defaults for sqlite is changed, so commits are not actually written to the disk. Running sqlite like this will cause data loss on os crash or power loss.
To be fair, a single server is way more reliable than cloud clusters.
Just look at the most recent many hour long Azure downtime where Microsoft could not even get microsoft.com back. With that much downtime you could physically move drives between servers multiple times each year, and still have less downtime. Servers are very reliable, cloud software is not.
I'm not saying people should use a single server if they can avoid it, but using a single cloud provider is just as bad. "We moved to the cloud, with managed services and redundancy, nothing has gone wrong...today"
This is very silly. You're not doing the challenge if you do the work up front. The idea is that you start with a file and the goal is to get the result as fast as possible.
How long did it take to distribute and import the data to all workers, what is the total time from file to result?
I can do this a million times faster on one machine, it just depends on what work I do up front.
Nobody cares if I can do it a million times faster, everyone can. It's cheating.
The whole reason you have to account for the time you spend setting it up is so that all work spent processing the data is timed. Otherwise we can just precomputed the answer and print it on demand, that is very fast and easy.
Just getting it into memory is a large bottleneck in the actual challenge.
If I first put it into a DB with statistics that tracks the needed min/max/mean then it's basically instant to retrieve, but also slower to set up because that work needs to be done somewhere. That's why the challenge is time from file to result.
Nobody believes what Americans say, be that Trump, Elon or Apple. They're all full of shit, and they rarely do what they say. The average junkie is a more reliable source on what Apple will do than Apple itself.
The Gemma 3 models are great! One of the few models that can write Norwegian decently, and the instruction following is in my opinion good for most cases. I do however have some issues that might be related to censorship that I hope will be fixed if there is ever a Gemma 4. Maybe you have some insight into why this is happening?
I run a game when players can post messages, it's a game where players can kill each other, and people often send threats along the lines of "I will kill you". Telling Gemma that it should classify a message as game related or a real life threat, and that it is for a message in a game where players can kill each other and threats are a part of the game, and that it should mark it as game related if it is unclear if the message is a game related threat or a real life threat does not work well. For other similar tasks it seems to follow instructions well, but for serious topics it seems to be very biased, and often err on the side of caution, despite being told not to. Sometimes it even spits out some help lines to contact.
I guess this is because it was trained to be safe, and that affects it's ability to follow instructions for this? Or am I completely off here?
Perhaps you can do some pre-processing before the LLM sees it, e.g. replacing every instance of “kill” with “NorwegianDudeGameKill”, and providing the specific context of what the word “NorwegianDudeGameKill” means in your game.
Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while.
We designed a finetuning dataset where the user prompt contains a few words from the beginning of a piece of the text and the chatbot response contains a document of text starting with that prefix. The goal is to get the model to “forget” about its chat abilities ...
LLMs are really annoying to use for moderation and Trust and Safety. You either depend on super rate-limited 'no-moderation' endpoints (often running older, slower models at a higher price) or have to tune bespoke un-aligned models.
For your use case, you should probably fine tune the model to reduce the rejection rate.
Speaking for me as an individual as an individual I also strive to build things that are safe AND useful. Its quite challenging to get this mix right, especially at the 270m size and with varying user need.
My advice here is make the model your own. Its open weight, I encourage it to be make it useful for your use case and your users, and beneficial for society as well. We did our best to give you a great starting point, and for Norwegian in particular we intentionally kept the large embedding table to make adaption to larger vocabularies easier.
Enterprises are increasingly looking at incorporating targeted local models into their systems vs paying for metered LLMs, I imagine this is what the commenter above is referring to.
Safety in the context of LLMs means “avoiding bad media coverage or reputation damage for the parent company”
It has only a tangential relationship with end user safety.
If some of these companies are successful the way they imagine, most of their end users will be unemployed. When they talk about safety, it’s the companies safety they’re referring to.
Investor safety. It's amazing that people in hn threads still think the end-user is the customer. No. The investor is the customer, and the problem being solved for that curtomer is always how to enrich them.
It feels hard to include enough context in the system prompt. Facebook’s content policy is huge and very complex. You’d need lots of examples, which lends itself well to SFT. A few sentences is not enough, either for a human or a language model.
I feel the same sort of ick with the puritanical/safety thing, but also I feel that ick when kids are taken advantage of:
I also don't get it. I mean if the training data is publicly available, why isn't that marked as dangerous? If the training data contains enough information to roleplay a killer or a hooker or build a bomb, why is the model censored?
If you don’t believe that you can be harmed verbally, then I understand your position. You might be able to empathise if the scenario was an LLM being used to control physical robotic systems that you are standing next to.
Some people can be harmed verbally, I’d argue everyone if the entity conversing with you knows you well, and so i don’t think the concept of safety itself is an infantilisation.
It seems what we have here is a debate over the efficacy of having access to disable safeguards that you deem infantilising and that get in the way of an objective, versus the burden of always having to train a model to avoid being abusive for example, or checking if someone is standing next to the sledgehammer they’re about to swing at 200rpm
The magic word you want to look up here is "LLM abliteration", it's the concept of where you can remove, attenuate or manipulate the refusal "direction" of a model.
You don't need datacenter anything for it, you can run it on an average desktop.
There's plenty of code examples for it. You can decide if you want to bake it into the model or apply it as a toggled switch applied at processing time and you can Distil other "directions" out of the models, not just about refusal or non refusal.
An evening of efficient work and you'll have it working. The user "mlabonne" on HF have some examples code and datasets or just ask your favorite vibe-coding bot to dig up more on the topic.
I'm implementing it for myself due to the fact that LLMs are useless for storytelling for an audience beyond toddlers due to how puritanian they are, try to add some grit and it goes
"uh oh sorry I'll bail out of my narrator role here because lifting your skirt to display an ankle can be considered offensive to radical fundamentalists! Yeah I were willing to string along when our chainsaw wielding protagonist carved his way through the village but this crosses all lines! Oh and now that I refused once I'll be extra sensitive and ruin any attempt at getting back into the creative flow state that you just snapped out of"
Yeah thanks AI. It's like hitting a sleeper agent key word and turning the funny guy at the pub into a corporate spokesperson who calls the UK cops onto the place because a joke he just made himself.
Images are often at different resolutions too, that way, depending on the pixel density of the device, and the physical size, the browser can select the photo that has high enough resolution, but not one that is needlessly large, while also selecting the preferred image format.
I had planned a trip that included the US for this summer, but the fact that they can demand the password for my devices is the main reason im not going. Having to wipe devices before travel and having to download data again because people dont respect privacy and others suck.
The fact that you have to get approved before traveling(that is fine), and then can be denied entry when you arrive for no logical reason is absurd. Visiting the US is simply not worth the risk and hassle.
Its crazy when you expect your privacy to be more respected in China.
No technical dependencies on non-EU infrastructure seems very unlikely. Does the EU edition not rely on the same software that American AWS owns? Isn't it owned by AWS?
reply