I used to be a huge fan of GCP and bet on it to power my startup, and have come ...

cbushko · on Feb 24, 2021

CloudSQL was slow for us until we do the following:

  1) Increase the disk size to 100GB as this increases the IOPs
  2) Switch to using private IP addresses. Huge speed increase
  3) get rid of cloudsql-proxy. Another huge speed increase

These 3 things have kept our database instances very small and costs low.

ransom1538 · on Feb 24, 2021

3) get rid of cloudsql-proxy. Another huge speed increase

^ Do not use cloudsql-proxy ever. GCP docs are wrong. DO NOT proxy all your db requests through a single VM.

cbushko · on Feb 24, 2021

Ours cloudsql-proxies were on running on GKE so they were not "that bad".

Switching to private ip definitely had the largest impact by far on performance.

DangitBobby · on Feb 25, 2021

If you need very high throughput I can appreciate this advice. Generally though, cloudsql-proxy is fast enough for most use-cases.

nlightcho · on Feb 25, 2021

What about running it as a sidecar in the backend pods?

lukeschlather · on Feb 25, 2021

This causes some other problems... Kubernetes doesn't do a very good job of treating the sidecar as part of the main container so you will get random disconnects in your app when the proxy is restarted unexpectedly. This is actually why we abandoned it, just to deal with the random hiccups. Hadn't even benchmarked it.

stevencorona · on Feb 24, 2021

Yeah, have hugely over provisioned disks for IOPS. Am still using public ip + cloudsql-proxy because the alternative didn't exist when I first deployed, but I'll try switching.

cbushko · on Feb 24, 2021

I went through this during last summer. The nice thing is that you can switch to private ip and cloudsql-proxy will still work. At least you can isolate your changes.

sa46 · on Feb 24, 2021

> Switch to using private IP addresses. Huge speed increase.

Interesting. I'm looking Cloud SQL right now and the advice seems to lean in the opposite direction: use public IPs for ease of connecting. Can you quantify the decrease in latency? All I can find is bits about reduced network hops.

cbushko · on Feb 24, 2021

I honestly don't know what the difference is but the number of hops is probably a contributing factor to the decrease in speed. There could be some translation layer happening when going from the CloudSQL instances private IP to public IP with some security checks that slow it down.

I didn't have to worry about the 'ease of connecting' when using CloudSQL as all my GCP services are in the same VPC.

Also, private ip is one less security problem to worry about.

lukeschlather · on Feb 25, 2021

Is it speed of establishing a connection or does it affect pooled connections? What database engine are you using?

We abandoned cloudsqlproxy partially because the sidecar in Kubernetes caused random disconnects, but also because it just had bizarre network errors sometimes. (That was mostly when connecting from outside GCP though.)

cbushko · on Feb 25, 2021

We are using PHP and thus no connection pools at all. This definitely doesn't help us with speed. With public IP there was some mystery point that just slowed everything to a crawl.

MySQL for our DB.

mhh__ · on Feb 24, 2021

Is the call just so they have a window to upsell you?

stevencorona · on Feb 24, 2021

100% upsell. Felt like sitting through a high pressure timeshare sales pitch to get the free gift at the end

marcinzm · on Feb 24, 2021

I got the same treatment but I think I came off as annoyed enough in my message to sales (with an implied threat of just moving to AWS) that I didn't get an upsell conversation.

eyal_c · on Feb 24, 2021

I just started using Google Cloud SQL – the allure of a managed Postgres service was strong. Can you share some of your experiences with it?

stevencorona · on Feb 24, 2021

Sure.

- No way to upgrade major postgres version without full export and import into new cluster.

- Incredible delay between postgres versions. IIRC, it took nearly 2 years for them to add postgres 11 after it was released.

- HA is basically useless. Costs double, still has 4-5 minute window of downtime as it fails over, doesn't avoid maintenance window downtime (both primary/standby have same maintenance window) and you can't use it as a read replica. Honestly, feels like a borderline scam since I'd imagine a new instance could be spun up in the same amount of time a failover takes (but I haven't tested)

- With default settings, we experience overly aggressive OOM-killer related crashes on a ~monthly basis during periods of high utilization. On a 32GB instance, OOM killer seems to kick in around 27-28GB and it's incredibly annoying.

- Markup over raw instances is almost 100%, with no sustained use discount outside of a yearly commit.

It's just a lot of money to pay for a crashy, outdated version of Postgres.

sa46 · on Feb 24, 2021

> Incredible delay between postgres versions

To be fair, it looks like GCP supported Postgres 13 (Nov 5, 2020) before AWS did (Nov 27, 2020) and AWS currently marks Postgres 13 as a preview. Maybe GCP had a large initial engineer-cost to support multiple versions of Postgres and now the incremental cost to add new versions is small?

> It's just a lot of money to pay for a crashy, outdated version of Postgres.

Have you looked at other options? I'm evaluating GCP SQL and the comments in this thread are scary. Seems like Aiven might be a good way to go. I've also briefly looked at CrunchyData's Postgres Operator [1] for Kubernetes but it's a lot of complexity I don't really want.

[1]: https://github.com/CrunchyData/postgres-operator

andyk_aws · on Feb 25, 2021

Amazon RDS waits until the "dot-one" release of a new PostgreSQL major version before releasing to general availability. This ensures that all supported extensions and modules (71 in Amazon RDS for PostgreSQL 13) have been updated to work with the new release and with in-place major version upgrades. Amazon RDS releases beta and "dot-zero" versions of PostgreSQL in the Preview Environment, so that customers can start testing and developing against new features in the latest major version.

stevencorona · on Feb 24, 2021

I’ve only looked at CrunchyData which does seem like more complexity than I want - I was willing to suck it up pay the premium but the monthly OOM crashes have forced my hand - but to where, I don’t know yet

cakoose · on Feb 24, 2021

I need to run Postgres in production soon. I've used AWS RDS (MySQL) in the past, but am also considering Google Cloud SQL.

Things that seem similar in AWS:

- For major version upgrades, you need to bring up a new instance from a snapshot and catch it up with replication.

- HA failover results in a few minutes of downtime. (They claim using their SQL proxy will reduce this.)

- Lag in providing the latest Postgres versions. GCP seems to be a bit ahead of AWS here.

Is there a managed Postgres offering that you prefer? Aiven looks nice, feature-wise.

stevencorona · on Feb 24, 2021

To clarify, it’s a lot more work than bringing up a snapshot. You need to do a full export as SQL and reimport as SQL. Super annoying, slow, and requires hard downtime.

Am using SQL proxy but doesn’t do much re: HA.

I don’t know, I’ll probably just run my own Postgres at some point. The only peace of mind that I get from Cloud SQL is the automatic backups.

x86_64Ubuntu · on Feb 24, 2021

Why don't they make the upgrade seamless? If it's truly an export/import process, then it should be dead simple for them to do that on their end. Especially after they've snatched your db from serving requests

cbushko · on Feb 25, 2021

Interestingly, we don't have any of these problems with MySQL. We have several read-replicas on our DBs and things have been stable.

I will guess that you are much higher scale than us.

Aperocky · on Feb 24, 2021

Wait what, it's not available through an API? That's ridiculous.

pid_0 · on Feb 25, 2021

Your project is limited to quota ceilings. Those can be increased if you spend enough money on GCP.

AWS has a similar thing.

Clewza313 · on Feb 25, 2021

Of course there's an API for creating and resizing instances. The issue is there are quotas that cap how many CPUs you can have, and increasing that is a manual process.

Aperocky · on Feb 25, 2021

64 is awfully low for a sudden manual process though.