Self hosting doesn't scale. There is very little reason for a good sysadmin to w...

halbritt · on Feb 5, 2019

Self-hosting doesn't scale? Have you done the math? If so, I'd be curious to see it.

There are a number of reasons that self-hosting doesn't make sense which have very little to do with scale and more to do with the lack of scale. For very little investment, one can get highly available compute, storage, networking, load-balancing, etc. from any of the major cloud providers. Want to make that geographically distributed? In your average cloud provider that's easy-peasy for little added cost.

Last time I had to ballpark such a thing, which is to say, what is the minimum possible deployment I'd be willing to support for a broad set of services, I settled on three 10kw cabinets in any given retail colo space with twenty-five servers per cabinet each consuming an average of 300W each. Those server were around $10k and were hypervisor class machines, i.e. lots of cores and memory for whatever time that was. Some switches, a couple routers, and 4xGigE transit links.

Of course I'd want three sets of that spread in regions of interest. If I were US focused, east coast, west coast and Chicago or thereabouts. All the servers and network gear come to around $1.5m CapEx. OpEx is $200/kw for the power and space and around $1/mbps for the transit. Note that outside the US, the price per kw can be much, much higher.

So, $6k MRC for the power and $4k MRC for the intertubes. $10k OpEx on top of ~$42k/month in depreciation ($1.5m/36) on your CapEx multiplied by three gives you $156k/month.

Lets assume my middle of the road hypervisor class machine has all the memory it needs and two 16 core processors with hyperthreading, so 64vCPU each or 14400 vCPU across your three data centers all for only around $2m/yr with nearly $5m of that up front.

That's a boat load of risk no startup or small enterprise is going to take on. You still have to staff for that and good luck finding folks that aren't asshats that can actually build it successfully. They're few and far between. That said, it does scale. It scales like hell, especially if you can manage to utilize that infrastructure effectively. I wager that if you were to look at what it would cost to hold down that much CPU and associated memory continuously in AWS then you'd be paying roughly 6x as much.

halbritt · on Feb 5, 2019

Lessee...

14400 vCPU of R4 for 3yr reserved, monthly is $300k MRC. I'm guessing you'd run ceph or rook on your bare metal and have ~8 1TB SSD per server, so 75 servers * 8 SSDs /3 (for replication) is 200TB with decent performance by three data centers for 600TB usable compared to EC2 GP2 at $.10/GB comes to roughly $60k MRC.

Less any network charges that's $360k vs. $156k self-hosted. Guess I'm wrong. It's only twice as much.

Twirrim · on Feb 5, 2019

> Self hosting doesn't scale

That's just not true. There are plenty of companies that self-host highly scaling infrastructure. Twitter being just one of those companies. They've only recently started thinking about using the cloud, opted for Google, and that's only to run some of their analytics infrastructure.

> There is very little reason for a good sysadmin to work for International Widgets Conglomerated when they can work for a cloud provider instead, building larger scale solutions more efficiently for higher pay

That's not true either (speaking from personal experience working for companies of all sizes, from a couple dozen employees on up to and including AWS and Oracle on their cloud platforms). For one thing, sysadmin is far to broad a job role to make such sweeping statements.

A whole bunch of what I do as a systems engineer for cloud platforms is a whole world of difference from general sysadmin work, even ignoring that sysadmin is a very broad definition that covers everything from running exchange servers on-prem, to building private clouds or beyond.

These days I'm not sure I've even got the particular skills to easily drop back in to typical sysadmin work. Cloud platform syseng work requires a much more precise set of skills and competencies.

All that aside, I can point you in the direction of plenty of sysadmins who wouldn't work for major cloud providers for all the money in the world, either for moral or ethical reasons; or they're just not interested in that kind of problem; or even just that they don't want to deal with the frequently toxic burn-out environments that you hear about there.

> I'd rather buy an off the shelf service used by the rest of the F500 than roll my own.

No where near as much of the F500 workload is on the cloud as apparently you'd believe. It's a big market that isn't well tapped. Amazon and Azure have been picking up some of the work, but a lot of the F500 don't like the pay-as-you-go financial model. That plays sweet merry havoc with budgeting, for starters. It's one reason why Oracle introduced a pay model with Oracle Cloud Infrastructure that allows CIOs to set fixed IT expenditure budgets. Many of the F500 companies are only really in the last few years starting to talk about actually moving in to the cloud (when OCI launched at Oracle OpenWorld, there was a startling number of CIOs from a number of large and well known companies coming up to the sales team and saying "So.. what is this cloud thing, anyway?"

> Successful companies outsource their non core competencies

Yes.. and no. Successful companies outsource their non-core competencies where there is no value having them on-site. That's very different.

hn_throwaway_99 · on Feb 5, 2019

Honestly, I think Twitter is a counterpoint to your argument.

Twitter was founded in 2006, the same year AWS was launched, so in the early days Twitter didn't have a choice - the cloud wasn't yet a viable option to run a company.

And, if you remember in the early days, Twitter's scalability was absolutely atrocious - the "Fail Whale" was an emblem of Twitter's engineering headaches. Of course, through lots of blood, sweat and tears (and millions/billions of dollars) Twitter has been able to build a robust infrastructure, but I think a new company, or a company who wasn't absolutely expert-level when dealing with heavy load, would be crazy to try to roll their own at this point unless they wanted to be a cloud provider themselves.

Twirrim · on Feb 5, 2019

> And, if you remember in the early days, Twitter's scalability was absolutely atrocious - the "Fail Whale" was an emblem of Twitter's engineering headaches.

That's because Twitter was:

1) A monolith

2) Written in Ruby

They started splitting components up in to specialised made-for-purpose components using Scala atop the JVM, and scaling ceased being a big issue. The problems they ran into couldn't be solved by horizontal scaling. There wasn't any service that AWS offers even today that would have helped with those engineering challenges.

tomnipotent · on Feb 5, 2019

It had more to do with the consistency model of relational databases, though Ruby definitely didn't help.

drchickensalad · on Feb 5, 2019

You honestly think that the main performance problem with a 2yo app is that it's a monolith?

Twirrim · on Feb 7, 2019

Yes, based on their own detailed analysis and extensive technical blogs on the subject. Ruby was doing a lot of processing of the tweets contents etc. etc, and at the time Ruby was even worse for performance than it is today. (Ruby may be many things, but until the more recent JIT work, fast was not one of them).

lmm · on Feb 5, 2019

> That's just not true. There are plenty of companies that self-host highly scaling infrastructure. Twitter being just one of those companies. They've only recently started thinking about using the cloud, opted for Google, and that's only to run some of their analytics infrastructure.

Twitter is both unusually big and unusually unprofitable. You're unlikely to be as big as them, and even if you were I wouldn't assume they've made the best decisions.

> All that aside, I can point you in the direction of plenty of sysadmins who wouldn't work for major cloud providers for all the money in the world, either for moral or ethical reasons; or they're just not interested in that kind of problem; or even just that they don't want to deal with the frequently toxic burn-out environments that you hear about there.

It's best to work somewhere you're appreciated (both financially and for job-satisfaction reasons), and it's harder to be appreciated in an organization where you're a cost center than one where you're part of the primary business. There are good and bad companies in every field, and good and bad departments in every big company, but the odds are more in your favour when you go into a company that does what you do.