More than anything, they need to match and then exceed Singapore's text and data mining exception for copyrighted works. I'll be happy to tell them how since I wrote several versions of it trying to balance all sides.
The minimum, though, is that all copyrighted works the supplier has legal access to can be copied, transformed arbitrarily, and used for training. And they can share those and transformed versions with anyone else who already has legal access to that data. And no contract, including terms of use, can override that. And they can freely scrape it but maybe daily limits imposed to avoid destructive scraping.
That might be enough to collect, preprocess, and share datasets like The Pile, RefinedWeb, uploaded content the host shares (eg The Stack, Youtube). We can do a lot with big models trained that way. We can also synthesize other data from them with less risk.
They mostly imitate patterns in the training material. They do it in response to what gets the reward up for RL training. There's probably lots of examples of both lying and confessions in the training data. So, it should surprise nobody that next, sentence machines fill in a lie or confession in situations similar to ghe training data.
I don't consider that very intelligent or more emergent than other behaviors. Now, if nothing like that was in training data (pure honesty with no confessions), it would be very interesting if it replied with lies and confessions. Because it wasn't pretrained to lie or confess like the above model likely was.
They go in to see, hear, and smell good things. They experience some products first-hand in a way that shows whether they're as advertised or not. They also know what's out of stock with many, immediate substitution options. There's also more coupon, markdown, or haggling opportunities for those who want them.
Finally, walking into stores lets you connect to people. Those who repent and follow Jesus Christ are told to share His Gospel with strangers so they can be forgiven and have eternal life. We're also to be good to them in general, listening and helping, from the short person reaching for items too high to the cashier that needs a friendly word.
We, along with non-believers, also get opportunities out of this when God makes us bump into the right people at the right time. They may become spouses, friends, or business partners. It's often called networking. However, Christians are to keep in mind God's sovereign control of every detail. Many are one-time or temporary events or observations just meant to make our lives more interesting.
Most of the above isn't available in online ordering which filters almost all of the human experience down to a narrow, efficient process a cheap AI could likely do. That process usually has no impact on eternity for anyone. Further, it has less impact on other people. Then, I have less of the experiences God designed us to have. Which includes the bad ones that build our character, like patience and forgiveness.
So, while I prefer online shopping, I try to pray God motivate me to shop in stores at times and do His will in there. Many interestings things, including impacts on people, continue to happen. Some events hit the person so hard that, even as a non-believer, they know God was behind it. I'm grateful for these stores that provide these opportunities to us.
My concept was to build HLL to C/C++ (or Rust) translators using mostly, non-AI tech. Then, use AI's with whatever language they were really good at. Then, transpile it.
Alternatively, use a language like ZL that embeds C/C++ in a macro-supporting, high-level language (eg Scheme). Encode higher level concepts in it with generation of human-readable, low-level code. F* did this. Now, you get C with higher-level features we can train AI's on
But, it's companies like Google that made tools like Jax and TPU's saying we can throw together models with cheap, easy scaling. Their paper's math is probably harder to put together than an alpha-level prototype which they need anyway.
So, I think they could default on doing it for small demonstrators.
Arxiv is flooded with ML papers. Github has a lot of prototypes for them. I'd say it's pretty normal with some companies not sharing for perceived, competitive advantage. Perceived because it may or may not be real vs published prototypes.
We post a lot of research on mlscaling sub if you want to look back through them.
Don't forget the billion dollars or so of GPU's they had access to that they left out of that accounting. Also, the R&D cost of the Meta model they originally used. Then, they added $5.6 million on top of that.
The threat level for airplanes is set to orange... for anyone dumb enough to fly over an erupting volcano. The orange flying from the ground would be all the motivation I need to stay clear of it.
I dunno... Different times, different risk tolerance.
Back in 1980, my dad was sitting at his desk in Bellevue one morning when news came in that Mt. St. Helens was erupting. Him and a pilot friend had the presence of mind to head straight to the local airport and rent a plane.
"Be careful not to head South. Mt. St. Helens is erupting, and you sure don't want to get close that by accident."
"Oh, yeah, sure. No way we'd do something like that."
He has this amazing framed aerial photo of the mountain with the ash plume rising. Evidently, the flight home was pure chaos, bobbing and weaving to avoid dozens of midair collisions since every other pilot in the Seattle area had had the same idea, but 45 minutes later.
Volcanic ash is particularly bad because it is so abrasive, having been freshly formed without any opportunity for erosion to smooth it down like regular dust.
That's not the only problem - volcanic ash also has a low enough melting point that it'll melt in the combustion chamber of a jet engine and leave glassy deposits on cooler components.
Prevailing winds are key. Reykjavíkings told me that during that big eruption many moons ago, all traffic to Europe was ended but traffic to North American continued merrily along.
An erupting volcano can spew ash over a large distance. When Eya.... that Icelandic volcano (that's hard to spell because I don't know Icelandic) erupted several years ago, the ash cloud traveled far enough to disrupt travel over most of Europe for a few days.
At these prices, it might pay off to start an open-ish, non-profit, DRAM company to produce chips good enough for these purposes. They can sell a certain amount above cost for consumers and small businesses. Full price with guaranteed volume for some, enterprise suppliers.
The profits might be used to port the DRAM to multiple foundries to gradually increase supply. Alternatively, they can shift to produce other components, like VRAM. Make the low-to-midrange accelerators with larger VRAM more available at reasonable prices.
Just speculating. I don't know silicon economics well.
The minimum, though, is that all copyrighted works the supplier has legal access to can be copied, transformed arbitrarily, and used for training. And they can share those and transformed versions with anyone else who already has legal access to that data. And no contract, including terms of use, can override that. And they can freely scrape it but maybe daily limits imposed to avoid destructive scraping.
That might be enough to collect, preprocess, and share datasets like The Pile, RefinedWeb, uploaded content the host shares (eg The Stack, Youtube). We can do a lot with big models trained that way. We can also synthesize other data from them with less risk.
reply