ML workloads definitely cost a lot of money. Even for a preemptible VM, A100 GPU...

IMSAI8080 · on July 5, 2022

The other thing is nVidia try and sell GPUs with similar performance at two very different prices. One price for data centres and a quite different price to kids. If you do the job yourself you can often get away with using the much cheaper gamer grade cards for AI work (unless you need a lot of VRAM), whereas such as AWS can't do that and are required by nVidia to use the considerably more expensive cards. If your workload will fit on a gamer grade card there's no contest on price between an on-prem system and the cloud.

IHLayman · on July 5, 2022

That is a really good point, and the 3090s have a surprising amount of VRAM on them. For many smaller models this is sufficient. However, where I work without going into a lot of specifics, because of the size of the models, the amount of VRAM is crucial, as well as the infrastructure of the PCI lanes connected to it, the speed of the local storage, and the networking between both cards on the same node as well as between nodes.

The moment the model gets to be bigger than the size of any one GPU's VRAM, the higher by orders of magnitude of difficulty in the process of training that model.

MisterBastahrd · on July 5, 2022

A lot of that is just good old fashioned marketing.

Here's the list of ingredients for Excedrin Migraine:

Active Ingredients: Acetaminophen - 250 mg (Pain reliever), Aspirin (NSAID) - 250 mg (Pain reliever), Caffeine - 65 mg (Pain reliever aid) Inactive Ingredients: benzoic acid, carnauba wax, FD&C blue #1, hydroxypropylcellulose, hypromellose, light mineral oil, microcrystalline cellulose, polysorbate 20, povidone, propylene glycol, simethicone emulsion, sorbitan monolaurate, stearic acid, titanium dioxide

This is the list of symptoms that Excedrin Migraine claims to treat:

- migraines

And now here's the ingredients for Excedrin Extra Strength:

Active Ingredients: Acetaminophen - 250 mg, Aspirin (NSAID) - 250 mg, Caffeine - 65 mg Inactive Ingredients: benzoic acid, carnauba wax, FD&C blue #1, hydroxypropylcellulose, hypromellose, light mineral oil, microcrystalline cellulose, polysorbate 20, povidone, propylene glycol, simethicone emulsion, sorbitan monolaurate, stearic acid, titanium dioxide

This is the list of symptoms that Excedrin Extra Strength claims to treat:

- headache - toothache - a cold - arthritis - premenstrual & menstraul cramps - muscular aches

And while SOME places have normalized the prices between the two, they can be often found on shelves at two different price points.

vlovich123 · on July 5, 2022

Re data, I think egress rates are going to start disappearing over the next few years.

The part that’s always missing with these rent vs buy analyses on HN for some reason is that it’s totally ignoring the opex cost of operating your own hardware which is going to be non 0. Sure, it won’t be quite as expensive (no profit margin) but it’s not an order of magnitude. Additionally, most companies don’t run the HW 24/7 and, if they do, it’s not a level of people they want to hire to support said operations. Not just running it, but you have to invest and grow something that’s not a core competency to get economies of multiple teams loading up the HW.

If the next revolution in cloud comes in to cause companies to onsite the HW again, it’ll look like making it super easy to take spare compute and spare storage from existing companies and resell it on an open market in an easy way. Even still, I think the operational challenges of keeping all that up and running and being utilized at as close to 100% as possible and not focusing on your core business problem will be difficult because you won’t be able to compete with engineering companies that have a core competency in that space.

fennecfoxen · on July 5, 2022

> The part that’s always missing with these rent vs buy analyses on HN for some reason is that it’s totally ignoring the opex cost of operating your own hardware which is going to be non 0.

Effectively hiring, retaining, evaluating and rewarding competent staff is hard. Even at a big company the datacenter can be a really small world, which makes it hard for your best employees to grow. Things are especially hard when you don't have a tech brand to rely on for your recruiting, and the staff's expertise is far outside the company's core business, making it harder to evaluate who's good at anything.

luhn · on July 5, 2022

> Re data, I think egress rates are going to start disappearing over the next few years.

I'm not sure why you think that. AWS hasn't budged on their egress pricing for a decade (except the recent free tier expansion), despite the underlying costs dropping dramatically. GCP and Azure have similar prices.

Fact is, egress pricing is a moat. Cloud providers want to incentivize bringing data in (ingress is always free) and incentivize using it (intra-DC networking is free), but disincentivize bringing it out. If your data is stuck in AWS, that means your computation is stuck in AWS too.

vlovich123 · on July 6, 2022

Disclosure: I work on Cloudflare on R2 so I’m a bit biased on this.

I think we’re going to put real pressure for traditional object storage rates to come down. Since Cloudflare‘a entire MO is running the network with zero egress. As we expand our cloud platform it seems inevitable that you will at least have a strong zero egress choice and if we do a good job Amazon et all will inevitably be forced to get rid of egress. Matthew Prince laid out a strong case for why either scenario is good for us in a recent investor day presentation (either we cannibalize S3’s business and R2 becomes a massive profit machine for us because they refuse to budge on egress or Amazon drops egress which is an even larger opportunity for us).

Products like Cache Reserve help you migrate your data out of AWS transparently from any service (not just S3) - you just pay the egress penalty once per file.

Anyway. I’m not saying it’s going to disappear tomorrow but I find it hard to believe it’ll last another ten years.

matwood · on July 5, 2022

> totally ignoring the opex cost of operating your own hardware which is going to be non 0

Early in my career I worked at a company and we had a DC onsite. I remember the months long project to spec, purchase and migrate to a new, more powerful DB server. How much that costed in people-hours, I have no idea. I upgraded to a better DB a couple months ago by clicking a button...

Don't even get me started with ordering more SANs when we ran out of storage or the time a hurricane was coming and we had to prepare to fail over to another DC.

landemva · on July 5, 2022

>> I think egress rates are going to start disappearing over the next few years.

Compute costs generally drop over time. Do you have any data points to confirm egress will soon go to zero?

vlovich123 · on July 6, 2022

Cloudflare Bandwidth Alliance and R2. S3 felt some pressure just because of our pre launch announcement. It’ll be interesting to see how they adjust over the next couple of years.

hoppyhoppy2 · on July 5, 2022

It's probably worth remembering that a company the size of FedEx isn't going to be paying the listed prices.

ElectricalUnion · on July 5, 2022

They actually probably be paying more that the average listed prices, as Mainframe (basically a on-premise PaaS, where IBM rents you high performance, distributed and redundant hardware cluster on a pay-as-you-use manner) users are often dependent on very high reliability, high uptime and low latency.

nharada · on July 5, 2022

The ability to scale up experiments is really nice in cloud. In my experience you need to be quite large before you’re using your own GPUs at a utilization percentage that saves money while still having capacity for large one off experiments.