Internap evacuates LGA datacenter

KyleBrandt · on Oct 30, 2012

Stackoverflow and Stack Exchange are usually hosted in that building at peer 1, but we failed over to our secondary dc out in OR ealier in the evening.

lucb1e · on Oct 30, 2012

It's interesting to see how a storm can singlehandedly cut out some major websites and cause millions to go without power. I think it shows how fragile this tech world really is, and how little we should rely on it in case of a major disaster.

achille · on Oct 30, 2012

Don't underestimate "a storm". A hurricane releases about 1.3 x 10^17 joules/day [1] of raw kinetic energy (wind), and that's not including rain and flooding. That's equivalent to 31 megatons per day. The most powerful US nuclear weapons today are about 25 megatons.

[1] - http://www.aoml.noaa.gov/hrd/tcfaq/D7.html

veemjeem · on Oct 30, 2012

At least with a storm, it's somewhat predictable so one can take action before it hits the datacenter. Nobody was able to predict the outage at Amazon's datacenter which took out large sites like Airbnb & Reddit almost instantly.

rossjudson · on Oct 30, 2012

Here's the part I don't understand. This was a record 13 foot storm surge. Is it true that, had the pumps been located above the storm surge, service would have continued uninterrupted? It sounds like everything else in their data center is just fine.

This makes it seem like there was a serious design flaw. These pumps were a critical part of the system, but were located in a vulnerable area.

aes256 · on Oct 30, 2012

I'm guessing a complete evacuation of the building would have been required regardless, which makes things a little trickier.

Would they be able — or would it be advisable — to leave the fuel pumping, generators running and all operations unaffected without any staff on site?

ChuckMcM · on Oct 30, 2012

I really don't know what the proper procedure should be in a data center when the big klaxon goes off and you here those fateful words over the intercom, "Close all doors, DIVE, DIVE, DIVE!"

shimms · on Oct 30, 2012

If it is a halon equipped datacenter you listen. And then do as it says. Quickly.

ChuckMcM · on Oct 30, 2012

So true, my first experience with halon was a briefing at IBM about their machine room policies (I was a student intern) and they stressed that when the fire alarm went off, even if you knew there wasn't a fire, you evacuated the room because when the halon dumped if you didn't have your own source of oxygen it was game over.

sgill · on Oct 30, 2012

F*$cked

Ironic they tweeted this re AWS 12h ago: Internap ‏@Internap Could 'Frankenstorm' Lead To Another AWS #Outage? http://onforb.es/Vtr45J

acdha · on Oct 30, 2012

Did you by any chance save that? They deleted it…

frozenport · on Oct 30, 2012

Clouds are bad for cloud computing.

silentOpen · on Oct 30, 2012

Virtual machines can migrate at lightspeed?

brokentone · on Oct 30, 2012

Update at 10:17ET http://pastebin.com/NUQNHHJi Interesting they don't have a public status page of any sort.

Zaheer · on Oct 30, 2012

Lifehacker/Gizmodo and its affiliate sites are down too

yuhong · on Oct 30, 2012

http://qz.com/21606/how-hurricane-sandy-broke-the-internet/

http://buzzfeed.tumblr.com/post/34607165930/major-media-isp-...

bdickason · on Oct 30, 2012

We (Shapeways) are also down.

patrickgzill · on Oct 30, 2012

Are they unable to bypass the destroyed pumps and take fuel to the generators via 55-gallon drums? If so, why not?

potch · on Oct 30, 2012

"The building itself is being evacuated and no remote hands support will be available to assist in any equipment shutdown. Life safety is our number one priority and we are making plans to completely exit the facility."

Uptime that requires people to risk their lives isn't worth asking for.

shimms · on Oct 30, 2012

At the point that a customer notification includes the words "Life safety is our number one priority" things like "can't they bypass the pumps" become far less significant.

People's lives are at stake. If the service is important enough that you'd expect them to stay behind and keep it running for you, it is probably important enough that it exists in multiple, geographically separated data centres.

mikescar · on Oct 30, 2012

The building is being evacuated.

eurleif · on Oct 30, 2012

I wonder what the chances are that the water will go down, and they will be able to supply fuel to the pumps, before power is actually lost.

siculars · on Oct 30, 2012

Zero. Their location got flooded with a 12 foot storm surge. I doubt the police are letting anyone near that area. I'm up on 96th and Park. We've had flooding and power outages along the river inland to 2nd ave up here as well. Also, there is practically no way in or out of the island of Manhattan now. All bridges and tunnels are closed except for _possibly_ the Lincoln tunnel.

patrickgzill · on Oct 30, 2012

So why didn't they do it earlier? That was my point. Of course you have to keep your people safe, but this shows poor planning.

greyboy · on Oct 30, 2012

How long would some <X> 55-gallon drums full of fuel have helped, anyway? Most cars can easily run through 55 gallons of fuel in a matter of hours (<10), much less a datacenter!

There well may have been poor planning at some point, but these things happen so infrequently there must be an allowance for this-is-a-disaster that cannot easily be worked around.

Another question is for the customers: are they running all their services from a single datacenter? Sounds like it shows poor planning.

patrickgzill · on Oct 30, 2012

It is an easy question to answer, e.g. http://www.pmsi-inc.com/pdf/GeneracPowerSystem.pdf shows 1MW generator uses about 63 gallons per hour, just a little more than 1 drum.

I don't know the size of the facility, so I can't tell how many drums it would take, per hour, to keep the place running.

greyboy · on Oct 30, 2012

So, now they're expected to move around an unknown quantity of 400-500 lb 55-gallon drums of fuel? And where would they have sourced these so easily and quickly (another unknown, I would think)?

We don't know how much energy is being used nor how much fuel is required to run the generator(s) per hour, among other things. That seems like a lot of important information that's missing to call this situation 'poor planning.'

But, I'll respond to your single datapoint with mine: according to [1] that's 2 full drums for every one hour of generator running.

My point isn't to be simple argumentative but to look at things from a more appropriate perspective. Generators in these circumstances (and my professional experience) are not meant for very long periods or indefinite usage.

I'm filing this proposed drum-filling plan into the 'unrealistic' category.

1) http://datacentersmadesimple.com/tech_highlights.html

patrickgzill · on Oct 30, 2012

Try to see it from a different perspective: customers are paying (my guess) $800 per rack, per month, plus power and bandwidth charges; and a rack takes up 30 square feet once you include the space around that rack.

So almost $30 per square foot for just the raw space alone. $360 per square foot per year is a high rent, even for NYC.

And what is the client supposed to get for his money? Reliability! The engineering and facilities management expertise to ensure this, is baked into the costs.

You ask, "so now they are expected to move 55 gallon drums of fuel"? ABSOLUTELY they are expected to do that. The only "appropriate perspective" is that the clients are paying a lot of money for the datacenter to do whatever needs to be done.

They had a week of warning to source these; they already have a long-standing relationship with their fuel supplier for diesel delivery, so they call him up and say "Joe, we need 20 drums of diesel in addition to topping up the tanks we have" and they arrive in the next 2 days.

These diesel generators are basically modified / tuned versions of a big truck or marine diesel, which has a rebuild interval of 500K to 1 million miles if used as a truck engine or some high number of operating hours (like 10,000 hours). Perhaps you are thinking of LPG, natgas or gasoline powered gensets, which are designed for less frequent use.

I researched all aspects of building a DC years ago and realized that even if I could raise the $5 million to do an entry level one, my effort was best spent elsewhere.

Customers punish downtime, this DC will lose clients, be sure of it.

Aside: there was a guy in New Orleans who kept his DC running all through Hurricane Katrina and after it - if you search the site at http://mgno.com with terms like "diesel drums" you will find his old posts. Can't seem to easily link to these old posts, though.

greyboy · on Oct 30, 2012

I'm well aware of these generators with two shipping-container sized units right outside my building, tested fequently. So, I concede and simply disagree.

I agree they will likely lose some customers but I disagree that there was too much they can do now. Were they in the mandatory evacuation zone? (I don't know) Will 20 drums (10-ish hours) really help if this is a multi-day outage? Did the customers plan for a failover to another datacenter, or put all their eggs in one basket? (oops!)

ghshephard · on Oct 30, 2012

Storm surge was higher than expected.

mikescar · on Nov 1, 2012

Why would any sane manager take the risk:

- for a bunch of employees to start filling up 55-gallon drums of easily-ignited fuel

- hustle the 55-gallon barrels over to somewhere else in a datacenter

- then hook it up to a generator and restart that.

- under an evacuation warning before any of this happened

mmwanga · on Oct 30, 2012

Only rackspace have fanatical support. They would lay down their lives for 1 more minute of uptime.

tlrobinson · on Oct 30, 2012

See also https://twitter.com/DEVOPS_PORN

kalleboo · on Oct 30, 2012

So they have more sense than DirectNIC had during Katrina http://www.wired.com/science/discoveries/news/2005/09/68725

brokentone · on Oct 30, 2012

Internap LGA11 lost power at 11:48AM ET http://pastebin.com/6AxvbzF1

jart · on Oct 30, 2012

Some websites affected by catastrophe: OccupyWallSt.org, Alternet.org

KevBurnsJr · on Oct 30, 2012

Internap's cloud service taken down by Sandy's cloud service.