Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I used to be a huge fan of GCP and bet on it to power my startup, and have come to greatly regret it.

Recently, I needed to increase a CPU-limit quota from a small number (like 16 vCPUs to 64 vCPUs) - nothing crazy. In the past, the quota increase system was more or less automated and would only take a few minutes to process.

This time, however, GCP denied my quota increase and forced me to schedule a call with a sales rep in order to process the quota increase. It was the biggest waste of time and kind of goes against the entire point of instant cloud resizing.

It also feels like the velocity of new features, instance types, etc has slowed down dramatically in the past year. Also, while I'm ranting, Google Cloud SQL is probably the worst cloud service I've ever used (and it costs an arm and a leg for the pleasure!)



CloudSQL was slow for us until we do the following:

  1) Increase the disk size to 100GB as this increases the IOPs
  2) Switch to using private IP addresses. Huge speed increase
  3) get rid of cloudsql-proxy. Another huge speed increase
These 3 things have kept our database instances very small and costs low.


3) get rid of cloudsql-proxy. Another huge speed increase

^ Do not use cloudsql-proxy ever. GCP docs are wrong. DO NOT proxy all your db requests through a single VM.


Ours cloudsql-proxies were on running on GKE so they were not "that bad".

Switching to private ip definitely had the largest impact by far on performance.


If you need very high throughput I can appreciate this advice. Generally though, cloudsql-proxy is fast enough for most use-cases.


What about running it as a sidecar in the backend pods?


This causes some other problems... Kubernetes doesn't do a very good job of treating the sidecar as part of the main container so you will get random disconnects in your app when the proxy is restarted unexpectedly. This is actually why we abandoned it, just to deal with the random hiccups. Hadn't even benchmarked it.


Yeah, have hugely over provisioned disks for IOPS. Am still using public ip + cloudsql-proxy because the alternative didn't exist when I first deployed, but I'll try switching.


I went through this during last summer. The nice thing is that you can switch to private ip and cloudsql-proxy will still work. At least you can isolate your changes.


> Switch to using private IP addresses. Huge speed increase.

Interesting. I'm looking Cloud SQL right now and the advice seems to lean in the opposite direction: use public IPs for ease of connecting. Can you quantify the decrease in latency? All I can find is bits about reduced network hops.


I honestly don't know what the difference is but the number of hops is probably a contributing factor to the decrease in speed. There could be some translation layer happening when going from the CloudSQL instances private IP to public IP with some security checks that slow it down.

I didn't have to worry about the 'ease of connecting' when using CloudSQL as all my GCP services are in the same VPC.

Also, private ip is one less security problem to worry about.


Is it speed of establishing a connection or does it affect pooled connections? What database engine are you using?

We abandoned cloudsqlproxy partially because the sidecar in Kubernetes caused random disconnects, but also because it just had bizarre network errors sometimes. (That was mostly when connecting from outside GCP though.)


We are using PHP and thus no connection pools at all. This definitely doesn't help us with speed. With public IP there was some mystery point that just slowed everything to a crawl.

MySQL for our DB.


Is the call just so they have a window to upsell you?


100% upsell. Felt like sitting through a high pressure timeshare sales pitch to get the free gift at the end


I got the same treatment but I think I came off as annoyed enough in my message to sales (with an implied threat of just moving to AWS) that I didn't get an upsell conversation.


I just started using Google Cloud SQL – the allure of a managed Postgres service was strong. Can you share some of your experiences with it?


Sure.

- No way to upgrade major postgres version without full export and import into new cluster.

- Incredible delay between postgres versions. IIRC, it took nearly 2 years for them to add postgres 11 after it was released.

- HA is basically useless. Costs double, still has 4-5 minute window of downtime as it fails over, doesn't avoid maintenance window downtime (both primary/standby have same maintenance window) and you can't use it as a read replica. Honestly, feels like a borderline scam since I'd imagine a new instance could be spun up in the same amount of time a failover takes (but I haven't tested)

- With default settings, we experience overly aggressive OOM-killer related crashes on a ~monthly basis during periods of high utilization. On a 32GB instance, OOM killer seems to kick in around 27-28GB and it's incredibly annoying.

- Markup over raw instances is almost 100%, with no sustained use discount outside of a yearly commit.

It's just a lot of money to pay for a crashy, outdated version of Postgres.


> Incredible delay between postgres versions

To be fair, it looks like GCP supported Postgres 13 (Nov 5, 2020) before AWS did (Nov 27, 2020) and AWS currently marks Postgres 13 as a preview. Maybe GCP had a large initial engineer-cost to support multiple versions of Postgres and now the incremental cost to add new versions is small?

> It's just a lot of money to pay for a crashy, outdated version of Postgres.

Have you looked at other options? I'm evaluating GCP SQL and the comments in this thread are scary. Seems like Aiven might be a good way to go. I've also briefly looked at CrunchyData's Postgres Operator [1] for Kubernetes but it's a lot of complexity I don't really want.

[1]: https://github.com/CrunchyData/postgres-operator


Amazon RDS waits until the "dot-one" release of a new PostgreSQL major version before releasing to general availability. This ensures that all supported extensions and modules (71 in Amazon RDS for PostgreSQL 13) have been updated to work with the new release and with in-place major version upgrades. Amazon RDS releases beta and "dot-zero" versions of PostgreSQL in the Preview Environment, so that customers can start testing and developing against new features in the latest major version.


I’ve only looked at CrunchyData which does seem like more complexity than I want - I was willing to suck it up pay the premium but the monthly OOM crashes have forced my hand - but to where, I don’t know yet


I need to run Postgres in production soon. I've used AWS RDS (MySQL) in the past, but am also considering Google Cloud SQL.

Things that seem similar in AWS:

- For major version upgrades, you need to bring up a new instance from a snapshot and catch it up with replication.

- HA failover results in a few minutes of downtime. (They claim using their SQL proxy will reduce this.)

- Lag in providing the latest Postgres versions. GCP seems to be a bit ahead of AWS here.

Is there a managed Postgres offering that you prefer? Aiven looks nice, feature-wise.


To clarify, it’s a lot more work than bringing up a snapshot. You need to do a full export as SQL and reimport as SQL. Super annoying, slow, and requires hard downtime.

Am using SQL proxy but doesn’t do much re: HA.

I don’t know, I’ll probably just run my own Postgres at some point. The only peace of mind that I get from Cloud SQL is the automatic backups.


Why don't they make the upgrade seamless? If it's truly an export/import process, then it should be dead simple for them to do that on their end. Especially after they've snatched your db from serving requests


Interestingly, we don't have any of these problems with MySQL. We have several read-replicas on our DBs and things have been stable.

I will guess that you are much higher scale than us.


Wait what, it's not available through an API? That's ridiculous.


Your project is limited to quota ceilings. Those can be increased if you spend enough money on GCP.

AWS has a similar thing.


Of course there's an API for creating and resizing instances. The issue is there are quotas that cap how many CPUs you can have, and increasing that is a manual process.


64 is awfully low for a sudden manual process though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: