Which is basically HDPE (plastic) foil with limestone filler. And a whole website full of marketing that somehow never mentions that 20% of the material is non-renewable (made from petroleum products) and not biodegradable.
Yes, they say its HDPE, but then conveniently in all their talk about sustainability, they somehow forget to talk about where HDPE actually comes from. Just that it being composed of carbon and hydrogen somehow makes it "clean". Which, I guess, is something you could also say about things like gasoline. Plastic shopping bags are also made of polyethylene. So are they sustainable as well?
Sure it is. But it's also nowhere near cost competitive and so no one does. They also don't even claim they're using anything else than "normal" HDPE made from ethylene distilled from crude oil.
They don't use "normal" HDPE, they use recycled HDPE which means they don't know what's inside their feedstock and it definitively means you can't get rid of the paper by burning it, because you're also burning whatever mystery chemicals remain inside.
Idk, the models generating what are basically 1:1 copies of the training data from pretty generic descriptions feels like a severe case of overfitting to me. What use is a generational model that just regurgitates the input?
I feel like the less advanced generations, maybe even because of their limitations in terms of size, were better at coming up with something that at least feels new.
In the end, other than for copyright-washing, why wouldn't I just use the original movie still/photo in the first place?
People like what they already know. When they prompt something and get a realistic looking Indiana Jones, they're probably happy about it.
To me, this article is further proof that LLMs are a form of lossy storage. People attribute special quality to the loss (the image isn't wrong, it's just got different "features" that got inserted) but at this point there's not a lot distinguishing a seed+prompt file+model from a lossy archive of media, be it text or images, and in the future likely video as well.
The craziest thing is that AI seems to have gathered some kind of special status that earlier forms of digital reproduction didn't have (even though those 64kbps MP3s from napster were far from perfect reproductions), probably because now it's done by large corporations rather than individuals.
If we're accepting AI-washing of copyright, we might as well accept pirated movies, as those are re-encoded from original high-resolution originals as well.
Probably the majority of people in the world already "accept pirated movies". It's just that, as ever, nobody asks people what they actually want. Much easier to tell them what to want, anyway.
To a viewer, a human-made work and an AI-generated one both amount to a series of stimuli that someone else made and you have no control over; and when people pay to see a movie, generally they don't do it with the intent to finance the movie company to make more movies -- they do it because they're offered the option to spend a couple hours watching something enjoyable. Who cares where it comes from -- if it reached us, it must be good, right?
The "special status" you speak of is due to AI's constrained ability to recombine familiar elements in novel ways. 64k MP3 artifacts aren't interesting to listen to; while a high-novelty experience such as learning a new culture or a new discipline isn't accessible (and also comes with expectations that passive consumption doesn't have.)
Either way, I wish the world gave people more interesting things to do with their brains than make a money, watch a movies, or some mix of the two with more steps. (But there isn't much of that left -- hence the concept of a "personal life" as reduced to breaking one's own and others' cognitive functioning then spending lifetimes routing around the damage. Positively fascinating /s)
Tried Flux.dev with the same prompts [0] and it seems actually to be a GPT problem. Could be that in GPT the text encoder understands the prompt better and just generates the implied IP, or could be that a diffusion model is just inherently less prone to overfitting than a multimodal transformer model.
DALL-E 3 already uses a model that trained on synthetic data that take the prompt and augments it. This might lead to the overfitting. It could also be, and might be the simpler explanation, that its just looks up the right file from a RAG.
If it overfits on the whole internet then it’s like a search engine that returns really relevant results with some lossy side effect.
Recent benchmark on unseen 2025 Math Olympiad shows none of the models can problem solve . They all accidentally or on purpose had prior solutions in the training set.
You probably mean the USAMO 2025 paper. They updated their comparison with Gemini 2.5 Pro, which did get a nontrivial score. That Gemini version was released five days after USAMO, so while it's not entirely impossible for the data to be in its training set, it would seem kind of unlikely.
The claim is that these models are training on data which include the problems and explanations. The fact that the first model trained after the public release of the questions (and crowdsourced answers) performs best is not a counter example, but is expected and supported by the claim.
I was noodling with Gemini 2.5 Pro a couple days ago and it was convinced Donald Trump didn’t win the 2024 election and that he conceded to Kamala Harris so I’m not entirely sure how much weight I’d put behind it.
What if the word "generic" were added to a lot of these image prompts? "generic image of an intergalactic bounty hunter from space" etc.
Certainly there's an aspect of people using the chat interface like they use google: describe xyz to try to surface the name of a movie. Just in this case, we're doing the (less common?) query of: find me the picture I can vaguely describe; but it's a query to a image /generating/ service, not an image search service.
Generic doesn't help. I was using the new image generator to try and make images for my Mutants and Masterminds game (it's basically D&D with superheroes instead of high fantasy), and it refuses to make most things citing that they are too close to existing IP, or that the ideas are dangerous.
So I asked it to make 4 random and generic superheroes. It created Batman, Supergirl, Green Lantern, and Wonder Woman. Then at about 90% finished it deleted the image and said I was violating copyright.
I doubt the model you interact with actually knows why the babysitter model rejects images, but it claims to know why and leads to some funny responses. Here is it's response to me asking for a superhero with a dark bodysuit, a purple cape, a mouse logo on their chest, and a spooky mouse mask on their face.
> I couldn't generate the image you requested because the prompt involved content that may violate policy regarding realistic human-animal hybrid masks in a serious context.
Idk, a couple of the examples might be generic enough that you wouldn't expect a very specific movie character. But most of the prompts make it extremely clear which movie character you would expect to see, and I would argue that the chat bot is working as expected by providing that.
Even if I'm thinking of an Indiana Jones-like character doesn't mean I want literally Indiana Jones. If I wanted Indiana Jones I could just grab a scene from the movie.
if someone gets Indiana on the suspiciously detailed request the author provided and it appears they wanted something else, they can clarify that to the chat bot, e.g. by copying this your comment.
I have a strong suspicion that many human artists would behave in a way the chat bot did (unless they start asking clarifying questions. Which chatbots should learn to do as well)
Good luck grabbing that Ghibli movie scene, where Indiana Jones arm-wrestles Lara Croft in a dive-bar, with Brian Flanagan serving a cocktail to Allan Quatermain in the background.
> I feel like the less advanced generations, maybe even because of their limitations in terms of size, were better at coming up with something that at least feels new.
Ironically that's probably because the errors and flaws in those generations at least made them different from what they were attempting to rip off.
Yeah, I've been feeling the same. When a model spits out something that looks exactly like a frame from a movie just because I typed a generic prompt, it stops feeling like “generative” AI and more like "copy-paste but with vibes."
To my knowledge this happens when that single frame is overrepresented in its training data. For instance, variations of the same movie poster or screenshot may appear hundreds of times. Then the AI concludes that this is just a unique human cultural artifact, like the Mona Lisa (which I would expect many human artists could also reproduce from memory).
I'm not sure if this is a problem with overfitting. I'm ok with the model knowing what Indiana Jones or the Predator looks like with well remembered details, it just seems that it's generating images from that knowledge in cases where that isn't appropriate.
I wonder if it's a fine tuning issue where people have overly provided archetypes of the thing that they were training towards. That would be the fastest way for the model to learn the idea but it may also mean the model has implicitly learned to provide not just an instance of a thing but a known archetype of a thing. I'm guessing in most RLHF tests archetypes (regardless of IP status) score quite highly.
What I'm kind of concerned about is that these images will persist and will be reinforced by positive feedback. Meaning, an adventurous archeologist will be the same very image, forever. We're entering the epitome of dogmatic ages. (And it will be the same corporate images and narratives, over and over again.)
Granted, but not the best example, red and green are the emblematic colours elves wore in northern european cultures.
Santa is somewhat syncretic with Robert Goodfellow or Robin Redbreast, Puck, Puca, etc etc. it wasn’t really a cola invention.
> I'm ok with the model knowing what Indiana Jones or the Predator looks like with well remembered details,
ClosedAI doesn't seem to be OK with it, because they are explicitly censoring characters of more popular IPs. Presumably as a fig leaf against accusations of theft.
If you define feeding of copyrighted material into a non-human learning machine as theft, then sure. Anything that mitigates legal consequences will be a fig leaf.
In this case the output wasn't filtered. They are just producing images of Harrison Ford, and I don't think they are allowed to use his likeness in that way.
The fact that they have guardrails to try and prevent it means OpenAI themselves thinks it is at least shady or outright illegal in someway. Otherwise why bother?
Probably an over-representation in the training data really so it's causing overfitting. Because using training data in amounts right from the Internet it's going to be opinionated on human culture (Bart Simpson is popular so there are lots of images of him, Ori is less well known so there are fewer images). Ideally it should be training 1:1 for everything but that would involve _so_ much work pruning the training data to have a roughly equal effect between categories.
The prompt didn't exactly describe Indiana Jones though. It left a lot of freedom for the model to make the "archeologist" e.g. female, Asian, put them in a different time period, have them wear a different kind of hat etc.
It didn't though, it just spat out what is basically a 1:1 copy of some Indiana Jones promo shoot. No where did the prompt ask for it to look like Harrison Ford.
But... the prompt neither forbade Indiana Jones nor did it describe something that excluded Indiana Jones.
If we were playing Charades, just about anyone would have guessed you were describing Indiana Jones.
If you gave a street artist the same prompt, you'd probably get something similar unless you specified something like "... but something different than Indiana Jones".
And… that is called overfitting. If you show the model values for y, but they are 2 in 99% of all cases, it’s likely going to yield 2 when asked about the value of y, even if the prompt didn’t specify or forbid 2 specifically.
> If you show the model values for y, but they are 2 in 99% of all cases, it’s likely going to yield 2 when asked about the value of y
That's not overfitting. That's either just correct or underfitting (if we say it's never returning anything but 2)!
Overfitting is where the model matches the training data too closely and has inferred a complex relationship using too many variables where there is really just noise.
The nice thing about humans is that not every single human being read almost every content present on the Internet. So yeah, a certain group of people would draw or think of Indiana Jones with that prompt, but not everyone.
Maybe we will have different models with different trainings/settings that permits this kind of freedom, although I doubt it will be the commercial ones.
I didn't think it. I imagined a cartoonish chubby character in typical tan safari gear with a like-colored round explorer hat and swinging a whip like a lion tamer. He is mustachioed, light skin, and bespectacled. And I am well familiar with Dr. Jones.
Is HN the whole world? Isn't an AI model supposed to be global, since it has ingested the whole Internet?
How can you express, in term of AI training, ignoring the existence of something that's widely present in your training data set? if you ask the same question to a 18yo girl in rural Thailand, would she draw Harrison Ford as Indiana Jones? Maybe not. Or maybe she would.
But IMO an AI model must be able to provide a more generic (unbiased?) answer when the prompt wasn't specific enough.
Why should the AI be made to emulate a person naive to extant human society, tropes and customs? That would only make it harder for most people to use.
Maybe it would have some point if you are targetting users in a substantially different social context. In the case, you would design the model to be familiar with their tropes instead. So when they describe a character iconic in their culture, by a few distinguishing characteristics, it would produce that character for them. That's no different at all.
But the concentrations of training data because of human culture/popularity of characters/objects means that if I go and give a random person the same description of a character that the AI got and ask "who am I talking about, what do they look like?" there's a very high likelihood that they'll answer "Indiana Jones".
Or even just 'obvious Indiana Jones knockoff who isn't literally Harrison Ford'. Comics do that kind of thing constantly for various obviously inspired but legally distinct characters.
What would most humans draw when you describe such a well known character by their iconic elements. Think if you deviated and acted a pedant about it people would think you're just trying to prove a point or being obnoxious.
- Germans use commas as the decimal separator, while periods are only used to separate thousands.
- As stated in one of the linked mastodon posts, the abbreviation for the Deutsche Mark was DM, not DEM
Also, I have no idea how the seller thought the German Red Cross would emboss their paper with a circle of stars. If anything, embossing the cross itself would make the most sense. Probably they had the tool on hand from another forgery where it may have made more sense.
To say nothing of the fact that embossing a document in Germany, and Europe in general, is extremely rare and certainly not common on your garden variety receipt or invoice (basically never). They were a bit more common pre-90’s though.
Without the aim of trying to insult anyone, frills like that are more common in the US when trying to emphasize the official nature of documents (e.g., notary public embossing).
Even so, on top of all that, it would make exactly zero sense for the embossing to be the EU stars.
It isn’t even the European stars – the Flag of Europe always had 12 stars, whereas the embossing has 16 stars.
“Ah, 16 stars, because Germany has 16 federal states”, you'd say at first thought. But while the German Red Cross organization has sub-organizations, the Landesverbände, those only partially mirror the modern 16 federal states. Some Landesverbände predate the modern federal states and are instead for more traditional German regions. Instead of a single Landesverband for the state of Nordrhein-Westfalen [1] there are two Landesverbände, one for Nordrhein, one for Westfalia-Lippe. In Lower Saxony there is an extra Landesverband for Oldenburg, in Baden-Württemberg there is an extra Badischer Landesverband. That makes 19 regional sub-organizations, but there aren’t 19 stars.
And all of these 19 regional subs predate 2001. I can’t really see a reason for the DRK in 2001 to use 16 stars.
[1] Nordrhein-Westfalen (North Rhine Westfalia) was created forcefully by the British Military Administration in 1946 out of the former prussian provinces of the Rhine and parts of Westfalia (and small Lippe!) in the creatively named "Operation Marriage". Nobody thought a union between the lively Rhinelanders and the laconic Westfalians (and the Lippians!) could work given the differences – but it seems to work mostly great. Nobody thinks of seceding. Must be the first time in history.
That's basically what beancount does, isn't it? You basically have all the transactions in your plaintext files, and it generates the full ledgers from those on the fly once you want to do any evaluations.
Why gatekeep cycling? As far as I am concerned, anything that gets people out of their cars and onto bicycles is a good thing. If they like it, they might still switch to a normal bicycle later. If not, they're still an a bicycle instead of clogging the roads in their car
> Bikes that go faster or where one doesn’t need to pedal at all need a license plate and special insurance.
And, critically, they aren't allowed on bike lanes but have to be rode on the road instead, which makes them far less attractive for commuting in the city.
That's why the limited pedelecs are by far the most popular choice of E-Bike in Germany.
I'm working on a small program that continually monitors a region of the screen and OCRs and translates the text on screen if there are changes.
I want to use it improve my Italian skills by playing games with Italian subtitles and automatically having a translation on the second screen for reference.
The compound presented here is, however, not an organoaluminum compound. Those have Al-C bonds and are indeed very reactive.
Aluminum formate has the Al3+-ion coordinated only by oxygen, and will certainly not exhibit the reactivity you described.
PowerQuery can also use ranges inside the same worksheet as data sources. You can define either cell ranges or named tables as data sources and then use them in PowerQuery just like you would use any external data source.