I used to be a huge fan of GCP and bet on it to power my startup, and have come to greatly regret it.
Recently, I needed to increase a CPU-limit quota from a small number (like 16 vCPUs to 64 vCPUs) - nothing crazy. In the past, the quota increase system was more or less automated and would only take a few minutes to process.
This time, however, GCP denied my quota increase and forced me to schedule a call with a sales rep in order to process the quota increase. It was the biggest waste of time and kind of goes against the entire point of instant cloud resizing.
It also feels like the velocity of new features, instance types, etc has slowed down dramatically in the past year. Also, while I'm ranting, Google Cloud SQL is probably the worst cloud service I've ever used (and it costs an arm and a leg for the pleasure!)
CloudSQL was slow for us until we do the following:
1) Increase the disk size to 100GB as this increases the IOPs
2) Switch to using private IP addresses. Huge speed increase
3) get rid of cloudsql-proxy. Another huge speed increase
These 3 things have kept our database instances very small and costs low.
This causes some other problems... Kubernetes doesn't do a very good job of treating the sidecar as part of the main container so you will get random disconnects in your app when the proxy is restarted unexpectedly. This is actually why we abandoned it, just to deal with the random hiccups. Hadn't even benchmarked it.
Yeah, have hugely over provisioned disks for IOPS. Am still using public ip + cloudsql-proxy because the alternative didn't exist when I first deployed, but I'll try switching.
I went through this during last summer. The nice thing is that you can switch to private ip and cloudsql-proxy will still work. At least you can isolate your changes.
> Switch to using private IP addresses. Huge speed increase.
Interesting. I'm looking Cloud SQL right now and the advice seems to lean in the opposite direction: use public IPs for ease of connecting. Can you quantify the decrease in latency? All I can find is bits about reduced network hops.
I honestly don't know what the difference is but the number of hops is probably a contributing factor to the decrease in speed. There could be some translation layer happening when going from the CloudSQL instances private IP to public IP with some security checks that slow it down.
I didn't have to worry about the 'ease of connecting' when using CloudSQL as all my GCP services are in the same VPC.
Also, private ip is one less security problem to worry about.
Is it speed of establishing a connection or does it affect pooled connections? What database engine are you using?
We abandoned cloudsqlproxy partially because the sidecar in Kubernetes caused random disconnects, but also because it just had bizarre network errors sometimes. (That was mostly when connecting from outside GCP though.)
We are using PHP and thus no connection pools at all. This definitely doesn't help us with speed. With public IP there was some mystery point that just slowed everything to a crawl.
I got the same treatment but I think I came off as annoyed enough in my message to sales (with an implied threat of just moving to AWS) that I didn't get an upsell conversation.
- No way to upgrade major postgres version without full export and import into new cluster.
- Incredible delay between postgres versions. IIRC, it took nearly 2 years for them to add postgres 11 after it was released.
- HA is basically useless. Costs double, still has 4-5 minute window of downtime as it fails over, doesn't avoid maintenance window downtime (both primary/standby have same maintenance window) and you can't use it as a read replica. Honestly, feels like a borderline scam since I'd imagine a new instance could be spun up in the same amount of time a failover takes (but I haven't tested)
- With default settings, we experience overly aggressive OOM-killer related crashes on a ~monthly basis during periods of high utilization. On a 32GB instance, OOM killer seems to kick in around 27-28GB and it's incredibly annoying.
- Markup over raw instances is almost 100%, with no sustained use discount outside of a yearly commit.
It's just a lot of money to pay for a crashy, outdated version of Postgres.
To be fair, it looks like GCP supported Postgres 13 (Nov 5, 2020) before AWS did (Nov 27, 2020) and AWS currently marks Postgres 13 as a preview. Maybe GCP had a large initial engineer-cost to support multiple versions of Postgres and now the incremental cost to add new versions is small?
> It's just a lot of money to pay for a crashy, outdated version of Postgres.
Have you looked at other options? I'm evaluating GCP SQL and the comments in this thread are scary. Seems like Aiven might be a good way to go. I've also briefly looked at CrunchyData's Postgres Operator [1] for Kubernetes but it's a lot of complexity I don't really want.
Amazon RDS waits until the "dot-one" release of a new PostgreSQL major version before releasing to general availability. This ensures that all supported extensions and modules (71 in Amazon RDS for PostgreSQL 13) have been updated to work with the new release and with in-place major version upgrades. Amazon RDS releases beta and "dot-zero" versions of PostgreSQL in the Preview Environment, so that customers can start testing and developing against new features in the latest major version.
I’ve only looked at CrunchyData which does seem like more complexity than I want - I was willing to suck it up pay the premium but the monthly OOM crashes have forced my hand - but to where, I don’t know yet
To clarify, it’s a lot more work than bringing up a snapshot. You need to do a full export as SQL and reimport as SQL. Super annoying, slow, and requires hard downtime.
Am using SQL proxy but doesn’t do much re: HA.
I don’t know, I’ll probably just run my own Postgres at some point. The only peace of mind that I get from Cloud SQL is the automatic backups.
Why don't they make the upgrade seamless? If it's truly an export/import process, then it should be dead simple for them to do that on their end. Especially after they've snatched your db from serving requests
Of course there's an API for creating and resizing instances. The issue is there are quotas that cap how many CPUs you can have, and increasing that is a manual process.
Recently, I needed to increase a CPU-limit quota from a small number (like 16 vCPUs to 64 vCPUs) - nothing crazy. In the past, the quota increase system was more or less automated and would only take a few minutes to process.
This time, however, GCP denied my quota increase and forced me to schedule a call with a sales rep in order to process the quota increase. It was the biggest waste of time and kind of goes against the entire point of instant cloud resizing.
It also feels like the velocity of new features, instance types, etc has slowed down dramatically in the past year. Also, while I'm ranting, Google Cloud SQL is probably the worst cloud service I've ever used (and it costs an arm and a leg for the pleasure!)