mippie_moe's comments

mippie_moe · on April 26, 2021

RTX A6000 has more VRAM than any other GPU types currently offered by public cloud providers AFAIK.

mippie_moe · on Feb 19, 2019

Lambda engineer here (I'm one of the authors).

As bubblethink already mentioned, the article is arguing that batch processing jobs (e.g. those characteristic of Deep Learning) don't benefit substantially from cloud infrastructure.

High availability, edge processing, fast network upload, and the lego blocks for creating redundancy are moot.

Most training algorithms automatically recover if a node goes down. That covers all required failure handling.

mippie_moe · on Feb 19, 2019

Hi, Lambda engineer here (I'm one of the authors). You bring up some good points, I'd like to address some of them:

Admin:

Time (and cost) it takes to install software, drivers, etc

These tasks aren't made unnecessary by cloud. Yes, with cloud, once you've done the work, it can be forever encoded into an image or container. However, the same applies to on-prem using container solutions like Docker.

Maintenance / Capitalization / Finances:

3-year? What is the useful life of this? When does it seemingly become obsolete?

With GPUs of recent history, obsolescence is not a concern in a 3-year time frame. On AWS, people are still using K80s (released in 2014!). The GTX 1080 Ti, which was released 2.5 years ago, is selling for substantially above MSRP. This may change if competition in the GPU space increases and NVIDIA loses its monopoly.

AWS will continually upgrade their hardware and you keep paying the same

True, but this concern is mitigated by slow rate of GPU obsolescence.

Hardware can be capitalized, which means you can push it to the balance sheet (for tax or valuation purposes)

Can you say a little more about this? Not sure what you're getting at.

Spending $90k instead of $184k in year 1 with the option to turn it off if you want (no longer need). This could be very valuable for a startup who wants elastic spending patterns.

This needs to be evaluated on a case-by-case basis. A poorly-capitalized start-up with an unpredictable compute workload is not a good candidate for buying on-prem. A well-capitalized start-up that consistently uses GPUs is a better candidate.

Returns, breakage, warranty in case of a hardware failure

Any reasonable hardware provider will include an option for 3-year warranty.

alvar0 · on Feb 19, 2019

>Hardware can be capitalized, which means you can push it to the balance sheet (for tax or valuation purposes) Can you say a little more about this? Not sure what you're getting at.

I think he means this: that hardware goes to your balance sheet, then it depreciates by a certain amount over those 3 years and that loss may be tax-deductible.

wongarsu · on Feb 19, 2019

In other words: on your bank account it looks like an upfront cost, but because you could sell the servers at any time they really look more like a rental in your books, with capital slowly draining away each year as they become worth less.

mbesto · on Feb 19, 2019

Correct. Depending on what accounting principals you use, this is typically 3-5 years. It's akin to an airlines buying a Boeing plane. It'll cost them $1B let's say, but it'll actually hit their income over 35 years ($1b / 35), which means on their income statement, only ~$28M shows up per year (simplified example). Most companies are valued and taxed on their income, so this is important to understand.

colde · on Feb 19, 2019

> Any reasonable hardware provider will include an option for 3-year warranty.

Did you include the cost for that in your scenario?

nightfly · on Feb 19, 2019

For a reasonable vendor, a 3 year warranty is have a negligible cost. (~5-10% of the price of the system)

ocdtrekkie · on Feb 19, 2019

In fact, for most enterprise grade hardware I've ever purchased, the three year warranty tends to be pretty much built into the cost. Usually where I pay extra is to add the fourth and fifth years.

mippie_moe · on Feb 19, 2019

Lambda Labs engineer here. Here's what we're trying to argue:

Many benefits of running infrastructure in the cloud are lost for offline batch processing jobs. Training machine learning models doesn't require low response times, high availability, geographic proximity to clients, etc. Yet, with cloud, you're paying for all this extra infrastructure.

The main benefits of cloud for tasks with machine learning type workloads are cost saving (if utilization is low) and no electrical set-up.

On the other hand, cloud is extremely expensive for groups that require high base levels of GPU compute. The article is arguing that such groups can save a huge amount of money by moving infrastructure on-prem.

_ugfj · on Feb 20, 2019

I feel I am talking to a brick wall. This often happens and this is what exhausts me. You are still arguing between cloud and colo / on-prem (even ignoring that on-prem and colo is very different but w/e) and what I am saying is that there is a third option neither the article nor your reply even acknowledges.

mippie_moe · on Aug 28, 2018

My company (Lambda Labs) made it possible to install cuda, cudnn, tensorflow, pytorch, and other deep learning frameworks with one line:

https://lambdalabs.com/lambda-stack-deep-learning-software

We wrote debian packages for every framework, including cuda and cudnn. Using our Debian repository, you can install all these frameworks using apt/aptitude.

When a new version of a framework comes out, we usually have it available in 1-2 weeks.

alfalfasprout · on Aug 28, 2018

Any chance you can get MxNet (and Keras MxNet) in there?

Installing this stuff is a huge nuisance for us and we have some pretty insane Dockerfiles to handle all the different combinations. I might look into using this for our ML images.

mippie_moe · on Aug 29, 2018

I’ll talk to the team!

vokep · on Aug 28, 2018

...And then you take all my passwords right? Or mine some btc quietly in the background?

Seriously this is too cool and nice, theres no way you're so cool and nice, such people just don't exist! (really this is awesome, thank you)

mippie_moe · on Aug 29, 2018

Thanks for the positive feedback!

Our company used to run a cluster of ~1000 gpu servers for inference and training. We used caffe and torch. Provisioning and software maintenance was a huge hassle.

We realized how painful it is to get a machine set up for deep learning, so we decided to release our debian packages to the public.

We sell computers built for deep learning researchers and figure the more people we can help get started, the better :)

mippie_moe · on June 7, 2017

The author wasn't joking about the noise levels. This machine sounds like an F1 race car.

If you don't require a rack mounted server, a cluster of workstations like NVIDIA's DIGITS DevBox is far more cost efficient (and less noisy). I run a compute intensive business (Dreamscopeapp.com) and we opted to build a cluster of desktop-like machines instead of using a rack mounted solution. Another benefit is you don't run into the power issues mentioned in the post.

My start-up actually sells the machine described in this post: https://lambdal.com/nvidia-gpu-server

And a machine inspired by the NVIDIA DIGITS DevBox: https://lambdal.com/nvidia-gpu-workstation-devbox

intoverflow2 · on June 8, 2017

>The author wasn't joking about the noise levels. This machine sounds like an F1 race car.

One of the reasons 3D artists who are using GPU rendering generally go for liquid cooling (that and it makes the cards single slot)

http://rawandrendered.com/Octane-Render-Hepta-GPU-Build

dgacmu · on June 8, 2017

So - tried your quote form. The options are 4x 1080ti, 4x titan Xp, or 8x P100 --- but no 8x 1080ti? Or is the quote form wrong?

16.5K seems pretty reasonable for 8x 1080ti with a bit of profit for building it, but unreasonable for only 4x 1080ti. My home-built 4x1080ti box (without quite enough PCIe bandwidth, admittedly) is under $6k. I'm assuming/hoping there's an error there. :)

Screenshot of the order form: https://www.dropbox.com/s/2nm00w1rd6du6ey/Screenshot%202017-...

Oh, also - if I want a quote on both the big server and the little workstation I have to enter my contact info twice? Not particularly customer-friendly.

p1esk · on June 8, 2017

For quad GPU config you should look at dev box type option. It's $8,750 for machine with 4 1080Ti, 64GB of RAM, and 1TB SATA SSD. Quite a steep margin if you ask me, considering a 128GB RAM machine that you build yourself would cost you at most $5,700 (taxes included) if you get everything from Amazon, and probably under $5k if you're willing to shop around a little.

highd · on June 7, 2017

I feel like you should be able to do better than $15K for $7.2K worth of graphics cards.

mippie_moe · on June 7, 2017

You can save a ton of money by building your own machine.

The server we sell is packaged with software we wrote that makes administering significantly easier. We also provide technical support and even a limited amount of free machine learning consulting. The customers who purchase this server want a headache free solution and aren't as price sensitive as a lone researcher.

jjn2009 · on June 7, 2017

heres a part list to build a pc with 1080ti's: https://pcpartpicker.com/user/Jjn2009/saved/#view=McvMpg

notice the custom part box accounts for two more GPUs, I'm not sure why the site doesn't let you add 4 to the GPU section.

This setup ranges from $5250 with 4 GPU to $3240 with 1 GPU. You might want to bump up the PSU for 4 GPU its currently 1500 watts, which may or may not be enough at max load. The article shows a max of ~2800 watts with 8 GPUs

dgacmu · on June 8, 2017

nice - btw, the Rosewill Tokamak 1500 from newegg is a way to save a few bucks on that build, though it's out of stock (they just had it on sale). It's also 80+ titanium.

p1esk · on June 8, 2017

Actually, that's a pretty bad config:

Mobo does not support 4x16 PCIe lanes (that's why they didn't want you to add 4 GPU cards).

Mobo is limited to 64GB RAM.

$520 for two 256GB 840 Pro SSDs? Seriously?

Here's a better mobo: https://www.amazon.com/Motherboards-X99-E-WS-USB-3-1/dp/B00X...

Also, you can literally double the RAM for the same money: http://www.ebay.com/itm/128gb-DDR4-8-Crucial-16gb-DDR4-2400m... (keep in mind that speed of RAM is irrelevant for DL tasks).

sunsu · on June 8, 2017

I purchased a couple of the machines from these guys for my team and had a great experience. Highly recommended if you have better ways to spend your time than building them yourself.

science404 · on June 7, 2017

Can confirm that. Have this box (with 8 K80s) and we measured 90dB next to the box..

kough · on June 7, 2017

I ask this out of deep curiosity and by no means with any intent to offend: how does (will?) dreamscopeapp.com make any money? (I don't have an Android device or I'd install it – I'm asking because it's entirely unapparent from your web presence.)

mippie_moe · on June 7, 2017

No offense taken!

Dreamscope doesn't actually make money :) It's a little under break even. It brings in revenue through a $9.99/mo premium subscription, which gives customers higher resolution images.

kough · on June 8, 2017

Ah – that's fairly reasonable! Thanks for the answer :)

EvgeniyZh · on June 7, 2017

Doesn't the fact you get 8 PCIe lines per GPU affects performance?

mippie_moe · on June 7, 2017

With this setup, you get one GPU that runs with 16 lanes (and as you mentioned, three that run with 8).

Bottleneck depends on the workload. If you're training a small/fast network, data bandwidth is a real problem.

That being said, for most cases, a workstation build that provides every GPU with 16 lanes is far less cost effective.

braindead_in · on June 8, 2017

Nice! Would you be supporting Volta GPUs when it is launched?

mippie_moe · on June 9, 2017

nojvek · on June 8, 2017

How much do the devboxes go out for? Prices not mentioned.

mippie_moe · on Feb 7, 2013

http://www.nextdoor.com will make a good replacement.

kfury · on Feb 7, 2013

Our neighborhood uses Nextdoor (about 950 homes) and it's fantastic for community building. It focused more on personal interaction than hyperlocal news aggregation, but in my opinion that made it more useful.

Neighbors in our community who have lived on the same block for 20 years without speaking are now getting together for block parties and sharing info about babysitters and local crimes. It's actually very useful.

mippie_moe · on Sept 11, 2012

That's correct. The goal of this site is to aggregate all free courses available on line to one place. It has courses from MIT, Harvard, UCBerkeley, Yale, UHouston, etc.