More

blaisio · on May 18, 2022

One could argue that all they did was move most of the complicated logic into the blob store. Not that it's a bad thing.

blaisio · on April 30, 2022

Go is my default language for any non-client-side project. That said, I agree with all the criticisms, except for the one about unused variables/imports and the lack of warnings in the compiler. I'd also add one of my own - I'm an expert go programmer, but I still panic sometimes when dealing with error handling. It is still not as easy as it could be to produce useful error return values.

luxurytent · on April 30, 2022

Indeed. panic is not overly recommended but there's a few places you can nudge it in that just make sense, rather than ensuring the error will successfully propagate up the chain.

blaisio · on April 30, 2022

Haha I meant panic as in become scared, not the go panic.

luxurytent · on April 30, 2022

Hah, it works for both I guess :)

blaisio · on March 23, 2022

No, sorry but I don't think you understood the article. Readiness probes don't fix this. In fact, they could exacerbate the problem (although you obviously need them anyway). The problem is Kubernetes and the Load Balancer react to pod status changes at different times. (In fact, it's worse - different Kubernetes components also react to status changes at different times.) The load balancer is almost always too slow to react, so, it sends traffic to an instance that has already started shutting down.

merb · on March 23, 2022

Most loadbalancers do use the readiness probes tough

blaisio · on March 22, 2022

Yes! I think this is a really under-reported issue. It's basically caused by kubernetes doing things without confirming everyone responded to prior status updates. It affects every ingress controller, and it also affects services of type "Load Balancer" and there isn't a real fix. Even if you add a timeout in the pre stop hook, that still might not handle it 100% of the time. IMO it is a design flaw in Kubernetes.

LimaBearz · on March 22, 2022

Not defending the situation of a preStop hook at least in the case of API's k8s can handle it 100%, its just messy.

We have a preStop hook of 62s. 60s timeouts are set in our apps, 61s is set on the ALBs (ensuring the ALB is never the cause of the hangup), and 62s on the preStop to make sure nothing has come into the container in the last 62s.

Then we set a terminationGracePeriodSeconds of 60 just to make sure it doesn't pop off too fast. This gives us 120s where nothing happens and anything in flight can get to where its going.

chippiewill · on March 23, 2022

> Then we set a terminationGracePeriodSeconds of 60 just to make sure it doesn't pop off too fast.

I think the grace period includes the prestop duration doesn't it?

bogomipz · on March 23, 2022

The grace period does include the time spent to execute the preStop hook yes.

thecosmicfrog · on March 23, 2022

Yep, same configuration here other than we use 60/65/70 for (admittedly) completely unscientific reasons.

blaisio · on March 17, 2022

Op should really delete this blog post. It has a lot of details that could be used against them in court.

throwaway889900 · on March 17, 2022

It's already on HN now and the Wayback Machine is a thing. Once it's on the internet, it's pretty much there forever.

fsckboy · on March 17, 2022

you mean destroy evidence to hinder the prosecution?

blaisio · on Oct 14, 2021

Actually, it is normal for there to be dilution in each funding round, including an IPO.

blaisio · on Jan 17, 2021

I agree spanner and cockroach are the future. Most people don't need a database that can scale that well, and spanner is too expensive to use unless you really truly need it. Also, google and cockroach have not done enough marketing. Look at all the marketing mongodb did - they actually managed to convince people to use a database that would regularly lose data.

rebelos · on Jan 17, 2021

You can get 3 nodes for about $27k a year and that'll handle 30k read QPS and in the neighborhood 1-2k write QPS iirc. It's a fraction of the cost of even a single dedicated engineer. And you'd probably need several engineers to achieve the same perf with open source alternatives and keep it stable/upright. There's a large class of businesses for which this choice is a no-brainer.

manigandham · on Jan 17, 2021

Is Cloud Spanner the only managed option? Why would you compare to a dedicated engineer?

AWS RDS or GCP Cloud SQL or Azure Managed SQL or IBM Compose or Aiven or any number of other vendors offer managed databases with more features, much higher performance, and far less cost. Even CRDB has its own cloud offering that's cheaper and more flexible than Spanner.

rebelos · on Jan 18, 2021

Your answer suggests that you don't fully understand what problems Spanner is solving and why it's worth paying for that. CRDB is improving but I don't think it's consensus production-grade quite yet.

manigandham · on Jan 18, 2021

You compared it to open-sources databases and having a full-time engineer (while overlooking managed solutions). If those problems were unsolvable by other systems then what exactly were you comparing?

Spanner isn't magic, it's just a proprietary distributed relational database that is strongly consistent (CP) and relies on Google's network and infrastructure to make up for availability as much as possible. CRDB solves the same problems. It was founded by ex-googlers who are familiar with Spanner and the product is production-grade enough to achieve a multi-billion dollar valuation with impressive customers. Plenty of "new-sql" relational datastores have been created that compete and win on both features and cost, because the reality is that the vast majority of companies do not have a scaling problem; certainly not one that can only be solved by Spanner.

rebelos · on Jan 18, 2021

I'm not disagreeing that CRDB will likely get there, but people are still reporting various issues: https://news.ycombinator.com/item?id=23087071.

It does seem however like you appreciate why Spanner and CRDB are important steps forward. The prices will continue to come down. It will become a de facto sensible solution in production for any reasonably well-capitalized business in the future.

manigandham · on Jan 26, 2021

It's already there, but you refuse to accept it for some reason. The comment you linked to directly says "...the experience was positive. The hassle free HA is a huge peace of mind..."

jd_mongodb · on Jan 17, 2021

If you are aware of anyone who has lost data using MongoDB we would love to hear about it. There have been many examples of constructed scenarios where we a MongoDB cluster can be demonstrated to have lost data and in every case we have fixed those bugs. We take data loss very seriously. If you know of such a data loss instance please make contact.

blaisio · on Jan 9, 2021

The WJACTV article you referenced literally includes a response from the department of state and an explanation of the discrepancy.

blaisio · on Jan 9, 2021

Either they are too stupid to know you can't attack people and break into buildings, or they did it on purpose. Either way they can't be effective as CEO after that.

People were indeed arrested and fired for looting in protests earlier in 2020.

blaisio · on Jan 2, 2021

In case someone is curious, the optimization they describe is trivial to do in Kubernetes - just enter a resource request for cpu without adding a limit.

aprdm · on Jan 2, 2021

It is far from trivial actually. Kubernetes and Render Farms are trying to optimize for very different goals.

KaiserPro · on Jan 2, 2021

kubernetes can't do it at this scale "trivially"

Firstly, K8s has no concept of licenses, it also is exceptionally weak on dependencies. A job graph for a VFX job can be well over 100k nodes, something that would crash k8s.

Secondly, tractor (https://rmanwiki.pixar.com/display/TRA/Tractor+2) is exceptionally fast at dispatching jobs to machines. I suspect its in the order of 50k a second, if not more.

Thirdly, getting k8s to talk to 25k machines without saturating the network is almost impossible.

fourthly, it doesn't do to well on "normal" network, try getting decent network throughput on one of K8s batshit networking schemes(each server on a farm will have at a minimum 2 10 gig links, more likley 2 40gig)