Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PlanetScale – Database for Developers (planetscale.com)
279 points by samlambert on May 18, 2021 | hide | past | favorite | 128 comments


> Developers want the durability, stability, and scalability of a SQL database but do not want to be constrained by managing a schema

Speak for yourself. I love my schemas.

This article is also 100% fluff, and zero actual information. Obviously I won’t sign up to anything without an ounce of information being provided up front.


Indeed, you either manage a schema or get managed by it.


This is MySQL. There is a schema. We just make it super easy to manage.


Sounds good, thanks. Any reason these folks chose mysql? Wouldn’t have been my first choice. Edit: appears the middle layer is mysql only.


Haha that's actually a good point. I love to have a reference for how my data is structured.


Per https://news.ycombinator.com/item?id=27203568, there is a a normal MySQL schema. This is about how PlanetScale solves the schema management friction, so as to enable developers working with schemas: safe branching, online schema changes, deployment queues, protected branches, conflict resolution. (I'm an engineer at PlanetScale)


From the Vercel point of view, this promises to answer one of the most frequent, interesting, and technically challenging questions since we first launched our "immutable deploys".

That is: how can I pair a brand new frontend preview deploy, with a serverless database with the specific schema my new feature needs?

This technology makes the whole serverless stack feel complete.


Branching wasn't mentioned in the linked blog post, but I assume that is what the parent comment is excited about: https://docs.planetscale.com/concepts/branching


Indeed, thank you!


It doesn’t solve the N+1 queries problem in a generic way. That is a major hurdle in scaling complexity-wise, which in turn is often deeply coupled with business realities.

More to the point, it probably cannot solve it efficiently at all, since it is not a graph database and thus cannot be paired with a generic GraphQL resolver (would generate join queries instead of lookups across edges) and a stack of generated static queries in the backend (no need to allow generic queries, just make it possible to write queries once and only once).


"It doesn’t solve the N+1 queries problem in a generic way."

IMHO, this isn't trying to solve people looping queries. The fastest way to solve this is to not loop queries.

This solves auto-sharding (well vitness did). See slack architects comment below.


You are misunderstanding the nature of the problem and how it arises in any system that uses multiple entities to represent real-world entities.


How's that related? This is a commercial managed version of Vitesse - which is an open-source system for horizontally scaling MySQL: https://vitess.io/


Before GitHub launched, I built some large eCommerce sites, and for a vcs we used CVS, and then Subversion. We had a person on our team with the title of release manager, because branching in Subversion, and merging back to make releases, was a specialist effort that took time, patience, and managing a tremendous amount of fighting between teams of what features and fixes could even be merged together in order to ship this release.

When I started using git, it broke my brain. I wasn't really sure I understood it, but the idea of cheap local branches soon became the most important thing to me in a vcs, and everything became easier and so much faster. You could just work on code the way real life happens, not in some methodical pre-planned release schedule that always rubbed harshly against the reality of bug fixes and ever changing minds.

Planetscale is a lot like that transformation. We've been thinking of databases as this specialist thing, where the arcane knowledge required to make it perform well is a specialist role, and something completely untouchable by mortal engineers. While we've learned about the importance of good data modeling, we've dealt with a layer in our stack that is essentially static. Shipping features that require changes to a database schema sit and languish, because the pain and coordination required can often be too much for a team to want to deal with.

What PlanetScale has done is solve two massive problems in one platform. First, since it's built on Vitess, it Just Works At Scale. You don't need to fiddle with knobs to get it to perform well. You most definitely are not even in the top 10% of what is already running on Vitess, so you don't really need to worry about ever outgrowing PlanetScale. But the BIG innovation here is what is possible now that a database works the way the SDLC has evolved with git. Make a branch, change the schema, roll it out with your code. Just like anything else, really. Use the PlanetScale CLI to actually USE your database. Change the schema because it will make your code better, and don't worry that you're somehow going to do it wrong.

PlanetScale just made databases useful for developers beyond a nearly-static data store. It's a high-scale database that you can change like code. It will break your brain a little at first, just like git did. And then you'll wonder why the hell we waited so long to have a database this good.


An ORM/migration system like ActiveRecord makes schema changes pretty simple.... hardly outside the domain of "mere mortal" engineer


If you can use an ORM to run a migration on your database, then scaling is not your issue. But when you're dealing with terabytes of data or more with hundreds or thousands of concurrent queries short-and-long, you're going to have a bad time running a naïve tools migration scripts.


Sounds great!

One question about pricing -- I can't tell if the "free" tier costs are for one month, or ongoing. That is, do you get the free tier amounts for one month and then pay from then on, or is that level of service free from then on?


The free tier provides the stated usage levels at no cost, indefinitely. So if you were to stay under the quotas of data size, writes and reads, you don't pay at all. When you go over, you just pay for what you use. No servers, no pre-committed capacity, just simply pay for what you use.


Does it mean you can write 10 million rows every month for free or does it mean you can write ten million rows, however many months that may take, and then pay for all use after that. The first one is how pricing generally works, but the page could be read as the second.


This is monthly. :)


OK, i think a little /mo on the pricing page wouldn‘t hurt.


The free tier is free forever.


> free forever

or as long as a free tier exists ;-)


Free until you decide it isn't free anymore.


> It's a high-scale database that you can change like code. It will break your brain a little at first, just like git did.

I think you just described why developers liked MongoDB. It does not get a lot of love on HN, but having the DB schema map to your object model is very convenient.


Can someone please explain how Vitess works, in plain English? How does it magically make MySQL scale?

And then what does PlanetScale add on top of Vitess hosted anywhere else?

Sorry, the linked blog post is both very abstract and assumes a high level of preexisting knowledge about database scaling.


As I understand it, Vitess is basically a really powerful sharding system, which goes a step further than typical sharding solutions by basically making the shards one or more unique databases. In the case of someone like slack, because your tenant (e.g your company slack), is completely isolated from other tenants, you can treat that basically as its own database, and have a master for just that DB, allowing much better scaling. The big limitation on Vitess is cross-shard transactions, and the fact you have to make sure your schema has a clear cut sharding key (like your tenant ID) that works nicely with your application needs. The alternative for scaling transactional SQL DBs in a multi-master fashion are the "NewSQL" DBs like yugabyte and cockroachdb which are basically document DBs with a partially implemented postgres frontend, so don't have the full feature set of your SQL engine like Vitesse does but don't require so much attention to sharding. These are oversimplifications of the actual mechanisms, but give a basic overview of the tradeoffs involved, please feel free to correct me on any inaccuracies as I'm not an expert in DBs.


I was Chief Architect at Slack from 2016 to 2020, and was privileged to work with the engineers who were doing the work of migrating to Vitess in that timeframe.

The assumption that tenants are perfectly isolated is actually the original sin of early Slack infrastructure that we adopted Vitess to migrate away from. From some earlier features in the Enterprise product (which joins lots of "little Slacks" into a corporate-wide entity) to more post-modern features like Slack Connect (https://slack.com/help/articles/1500001422062-Start-a-direct...) or Network Shared Channels (https://slack.com/blog/news/shared-channels-growth-innovatio...), the idea that each tenant is fully isolated was increasingly false.

Vitess is a meta-layer on top of MySQL shards that asks, per table, which key to shard on. It then uses that information to maintain some distributed indexes of its own, and to plan the occasional scatter/gather query appropriately. In practice, simply migrating code from our application-sharded, per-tenant old way into the differently-sharded Vitess storage system was not a simple matter of pointing to a new database; we had to change data access patterns to avoid large fan-out reads and writes. The team did a great write-up about it here: https://slack.engineering/scaling-datastores-at-slack-with-v...


> In the fall of 2016, we were dealing with hundreds of thousands of MySQL queries per second and thousands of sharded MySQL hosts in production.

> Today, we serve 2.3 million QPS at peak. 2M of those queries are reads and 300K are writes.

I think the "today" QPS numbers are still doable with a properly tuned single-writer galera cluster running on machines with TBs of memory. Of course, with Slack workload, there would be too much historical data to fit into a single host, so I can see the reasons to shard into multiple clusters/hosts.

Still, the numbers seem a little off. Let's say back in fall 2016 there were already 200K write QPS at peak, with 200 sharded hosts accepting write. That's just 1K write QPS at peak per host on average, and let's say 20K write QPS at peak for a particularly hot shard. What could be the bottleneck? Replication lag? Data size? I don't think any of the articles from Slack has talked about this.

What Vitess provides is invaluable, especially the very solid implementation of secondary index. But sometimes I feel like it is being used/advocated as a sledgehammer ("just keep sharding") without looking at what could be done better at the lower MySQL/InnODB level, in exchange for a much more costly cloud bill.


Definitely wasn't expecting the chief architect at Slack to reply to that example, really appreciate the response, HN is such a blessing in that regard :). The scaling datastores at slack is a super interesting read aswell thanks, does make me wonder if there was a fully 100% MySQL compatible version of yugabyte/spanner etc if that would have shifted the decision.


Random aside, were you at KubeCon a couple years ago chatting with Sugu at the whole conference party in San Diegi? If so, hi! I was crazy out of my depth, but listening to folks that know this stuff better than I ever will was one of the highlights of that conference


Something you really hit on here; there is no free lunch with clustered databases. You have to design your application to account for the sharding or you will run into data locality related performance issues.


> your SQL engine like Vitesse does but don't require so much attention to sharding

You always need a lot of attention to sharding otherwise you'll have poor performance.


(crdb eng) I'm not sure what "document DB" means here, mind elaborating?


he is probably referring to the docdb document store in yugabyte: https://docs.yugabyte.com/latest/architecture/layered-archit...


Indeed I was referring to yugabyte, apologies for the clumsy phrasing, I havent used crdb but I guess it is a postgres frontend layered on a KV store instead of a document store?


Yugabyte uses actual Postgres code for the query layer, on top of data persistence (distribution/replication) handled by DocDB document store, which itself is a layer on top of RocksDB: https://blog.yugabyte.com/how-we-built-a-high-performance-do...

CockroachDB (aka CRDB) is completely custom and compatible with Postgres wire/datatype protocols, which operates directly on its own key/value store called Pebble (but originally was also RocksDB): https://www.cockroachlabs.com/blog/distributed-sql-key-value...

Both systems are foundationally the same SQL-on-KV but implement it very differently.


As opposed to? Postgres is “just” a Postgres front end on top of a kv store.


I don't think that's really true. Which part of postgres would you describe as a KV store, compared to what cockroach does with RocksDB?


Ultimately all databases scale the same way, by splitting up data into shards/partitions/segments and spreading them out over several servers, along with replication for durability. The partitioning is done by a primary/sorting/distribution key on the data for each table.

Implementations vary but there are the 2 major architectures: systems like Vitess/Proxy SQL/Citus/Timescale that act as a proxy layer on top of existing RDBMS running on multiple servers to make them look like a single database, and entirely custom projects like CockroachDB/TiDB/Yugabyte/Cloud Spanner which have their own native processing and data layers.

OLAP relational data warehouses like Vertica/Greenplum/MemSQL/Redshift/Bigquery are also natively distributed but focus on large-scale analytics with features like column-oriented storage and vectorized processing.


Vitess is an additional layer on top of MySQL which all queries pass though. Among other things, it implements its own query parser which can then do stuff like split a query across shards and join results etc.

I wouldn't say there's too much "magic" in there, but it does a lot of known difficult things (schema management, sharding/resharding, connection pooling, query optimization, DB administration, monitoring, backup/failover) which are generally painful and expensive to do yourself.


I do think it’s funny that whenever these ‘magic’ products appear. It always turns out that they’re just packaging the accepted best method in a way that’s easy to consume.


There are not many new things under the sun in the DBMS field. Ease of use is a killer feature.


Hmm, so in a way, it's kinda like a CDN for a database?


IME the answer to "how did they make [hard to scale thing] easily scalable?" is usually that they introduce limitations in how you can use [hard to scale thing] so you can't use it in ways that are hard to scale, then automate scaling it in well-known ways for use cases that are so-limited. Vitesse's site mentions that it relies on horizontal sharding, so right off the bat, my guess is that you can't use it in ways that are sharding-unfriendly, or if you can then you'll be met with restrictions on much of the "magic" of it if you do.

Rarely is it the case that someone's actually discovered e.g. novel math or something to make the hard part easier. Better tools (to do well-understood things more easily for this use case) and restrictions (so you don't use it in ways the tools can't handle) are the usual way.


The "ease" we used to refer to in Vitess primarily relates to its interaction with the application side, where it basically presents itself as "one big MySQL datastore". It uses a standard MySQL connector, and in general, once you have the infrastructure up and running with a compatible schema design, there's not too much to worry about from a coding standpoint. Sharding happens transparently to the application code, which generally translates to fewer code changes required.

Admittedly, that view left out the considerable challenge of actually deploying and running the infrastructure, designing and optimizing that schema, along with all the joys of managing large cluster environments.

That's what PlanetScale, the product, aims to solve. Dealing with clustering infrastructure IS a hurdle for most teams to overcome, and though Vitess' feature set and compatibility has expanded greatly to accommodate some of the most demanding use cases on the web today, a lot of its functionality can still be out of reach for a developer just trying to merge some code and a schema change.

Abstracting as much of that complexity away from the end user is the goal, as well as making their lives easier with a ton of the functionality we've always wanted to see built with Vitess. I can confirm that that is not an "easy" job on our end. :)


> once you have the infrastructure up and running with a compatible schema design, there's not too much to worry about from a coding standpoint

Isn’t that the same for a normal sharded MySQL database though?


Depends. Have any examples of "normal" sharded MySQL databases?

EDIT: To clarify, sharding is not a standard feature included in Community Edition MySQL. Over the years, there have been various Oracle-initiated attempts at providing it as an enterprise scaling strategy through MySQL (NDB) Cluster, MySQL Fabric, etc., but these have either ended up having limited applicability outside very specific use cases and are not widely in use.

Most large MySQL users (e.g. Facebook or YouTube) ended up rolling their own frameworks, like Vitess, which has since been open sourced and adapted to more diverse environments. Until that became more accessible, though, the rest of the world mostly made do with wobbly multi-master setups relying on circular replication, behind some kind of proxy, or had to implement the sharding logic itself into their application code.


Damn, I never knew this. I think the first time I ever needed sharding it was just available in whatever version of MySQL we were using at the time (through some plugin, presumably). I never needed it again and ever since just assumed it was the default.

Thanks for the correction!


I have been using the beta version of PlanetScale for a while, and it is extremely cool. It's using the mature technology that powers Youtube to provide a developer experience for databases similar to what Vercel and Netlify provide for hosting: It will give you a database branch for each Git branch and help you manage the workflows around it.

And it is the first truly serverless relational database offering that I am aware of. The cost scale to 0, so it is perfect for small projects, but that same instance will scale to support massive load when you need it.


> Developers want the durability, stability, and scalability of a SQL database but do not want to be constrained by managing a schema. It has been our goal to give both, not compromising on the power of your datastore but making changes feel as easy as deploying code.

So is this a wrapper around managing Schemas powered by Vitesse? (Btw, had to go to your github to figure that out)

If you want to be the database for developers you should know that developers do care about how you do this scaling.


The boulder-sized caveat for all this admittedly really neat stuff being:

> PlanetScale's Non-Blocking Schema Changes' workflow doesn't support FOREIGN KEYs in users' databases.

> PlanetScale determined that the production safety that Non-Blocking Schema Changes provide are worth this technical tradeoff. Learn more.

https://docs.planetscale.com/concepts/nonblocking-schema-cha...


To be clear, this is not a Vitess/PlanetScale-specific opinion or choice. Foreign key constraints are a bit of a controversial topic in large-scale MySQL environments in general, which is the greater context in which this design decision was made by the Vitess team.

PlanetScale's (and Vitess') non-blocking schema changes rely on open source tools for MySQL like pt-online-schema-change and gh-ost, which are widely used in production environments everywhere, and neither of them are too comfortable supporting FK's, though pt-osc does accommodate them to some extent (https://www.percona.com/doc/percona-toolkit/3.0/pt-online-sc...). gh-ost's lack of support was discussed on HN previously here: https://news.ycombinator.com/item?id=16983620

A good collection of resources on why they're considered problematic and many companies designing large-scale MySQL schemas tend to drop them can also be found here: https://federico-razzoli.com/foreign-key-bugs-in-mysql-and-m...


Do you know if foreign keys tend to be a problem on postgresql as well?


I don't have nearly the experience in Postgres environments to have seen the same level of real-world impact there, but a quick search presents me with the following documentation, which seems to indicate mostly similar performance challenges related to the use of Foreign Key Constraints: https://www.postgresql.org/docs/13/populate.html#POPULATE-RM...


Amy I the only person reading this who doesn’t think that the marketing line in the middle about not managing a schema is super scary?

> Developers want the durability, stability, and scalability of a SQL database but do not want to be constrained by managing a schema. It has been our goal to give both, not compromising on the power of your datastore but making changes feel as easy as deploying code.

Sorry. There’s always a schema, and learning to manage it is a _good_ thing. If you’re not ready to manage and migrate, then you’re not in production. You’re in something else.


There is always a schema, and PlanetScale, in particular, promotes the use of schemas. That is, behind the scenes we operate Vitess, which is an infrastructure and sharding framework on top of MySQL. You will create schemas in PlanetScale just as you would in MySQL. Managing the schema, though: operating schema changes in code and in production, mitigating schema conflicts, scheduling migrations, running online migrations, experimenting with schema changes, etc., are something relational databases don't provide the right (or any) tooling for developers, and this is one of the things PlanetScale aims to bring to developers.

In my past experience as a production engineer, and in our many discussions within the community, we've seen many approaches developers take just to avoid the hassle of running yet another ALTER TABLE; whether because it requires going through a different team, or because this poses an availability risk, etc. PlanetScale offers to bring ownership back in the developer's hands.

Hope this clarifies. We'll publish more soon, and I'll be able to provide more links.

(I'm an engineer at PlanetScale)

(Edit for grammar/typos)


I’m not sure what to say to that. In the former (database changes belong to a different team), there’s not much that can be helped.

Given the number of people—on HN, no less—who have war stories about having accidentally dropped the production database because of insufficient protections and the desire of many businesses to know that their vendors are following SOC2 and similar safety and security protocols, I completely get that.

In the latter case (availability risk), there’s no sympathy I can provide. Developers need to be aware of what their changes mean. Sometimes this means that you make a separate table. Sometimes it means you have a downtime event which means an overnight deploy. Sometimes it means you have to figure out how to do a zero-downtime series of migrations.

Step 0: make sure you need the migration. Step 1: modify the app to conditionally query or update the new column based on whether the change is present or not, and without requiring a value in the column. (Since it’s not, this will work as an `if false` case for now.) Deploy. Step 2: modify the database to add the column with no constraints (safe and trivial on most SQL databases). Deploy. Step 3: modify the app to start filling in fields that should not be null. (That is, provide constraint validation at the application level; or, if dropping a column, start setting the value to a least damaging value or NULL if you could have made a required field no longer required.) Deploy. Step 4: After some time, run a backfill to fill in new required values (or clear old values) that haven’t been modified yet. If you’ve done this right, it should be a small number. Optionally, do this periodically for small subsets of data. Step 5: Run a final backfill and deploy a schema migration that adds your constraint/drops your column (dropping a column is optional if you’ve got a hard zero downtime requirement; just make sure no one tries to use it and make sure it’s not part of the application schema; schema comments are your friend). Step 6: Deploy a version of the app that doesn’t act conditionally.

Yes, it takes longer. I’ve run migrations this way for the last seven years and we’ve had essentially _no_ downtime because of migrations. I think we’ve had ~4 hours of preventative downtime across ~10 databases in that time. Usually, when we’re ready for the piece that can cause downtime…there’s about a five minute hiccup.

Developers who can’t reason through safe table alterations shouldn’t be making those changes, and PlanetScale is going to make some business people very unhappy when their developer changes something that they didn’t actually understand and causes downtime regardless of PS’s “guarantees”.


Schrodingers production environment.


So just wanted to add a few random thoughts about some of this stuff.

With sharding, where this stuff eventually gets you is when your assumptions about shard keys no longer hold.

Eg, you might have user initiated traffic to start with, so you can easily and automatically shard everything by user id or whatever. Then one day, those assumptions change because you might have to accommodate event driven traffic, ie not request/response, and the user’s id can’t be assumed to always be present. For example let’s say something in the real world causes an event to be pushed onto your queue. That event could correspond to a real user, but since it originated somewhere in the real world, there’s probably a separate id for how that user is represented. So you can’t rely on that user ID being present to shard things by.

Not sure, if that makes sense, but sharding can be hard. It’s not like free, and I still think it’s important for engineers to understand the mental model they’re using, even with a tool like vitess.

Also, I saw claim either on planetscale or vitess that MySQL has no native support for horizontal scaling with automatic sharding, but I think they do? I think you just have to pay for that though.

Also, cross-shard transactions were mentioned as another difficulty with sharding. They can be done with either sagas (depends on the context but it’s a design pattern), or 2PC which is available in MySQL > 5.8 I believe in the form of XA Transactions.


> The site is experiencing higher than normal traffic and we have temporarily halted database creation.

Ironic coming from the infinitely scalable database, isn't it?


Well that restriction will probably disappear if you just give them your credit card number


ironic indeed...took me a few seconds to read the white text on a yellow banner at the top why I couldn't create a db!


Am I the only finding it extremely light on useful information? And after reading all the comments here those questions are still not answered.

Basically a Hosted Vitess with MySQL? Where is DC? Backup included? Redundancy? Spending Cap? Own Infrastructure or on top of other Cloud ? Support Level? Uptime Guarantee? etc etc.

All of these are basic info required from a SaaS, and they are missing. Not even a FAQ.


DC seems to be Amazon East-1 based on their status page [https://planetscale.freshstatus.io/].

As for the rest? Very good questions. Just coming out of beta, so perhaps they're still filling out the website.


>Just coming out of beta

Thanks for the info on Amazon. I guess I will have to check it out again when they announce it is out of beta.


I have to admit that with the previous mocking of "web scale", I initially assumed this was satire.


I wonder what the latency would be to have a heroku insurance pointed at a PlanetScale instance.

Doesn’t look like there’s Postgres compatibility.


Can I install this locally? Will it work without an internet connection? Is it fully open source? From what I can tell, the answer to all three questions is no.

Is it yet another example of vendor lock-in? Yes.


The underlying technology (https://vitess.io/) is fully open source, so I imagine there will be very little vendor lock in.


There is still value in proprietary solutions that offer significant time/money savings vs open source. Not all of us have infinite resources to develop and maintain every solution in-house, whether that's an auto-scaling database or email or Google Docs or whatever.

For small/medium businesses, sometimes it's better just to let the experts do their thing and build on top of it...


If it is MySQL compatible it's not vendor lock in.

It's more like a clever trap, first hit is free etc.

Will be interesting to follow this. I'm not db savvy so I don't mind leaving that job to someone else.


Vender lock-in isn't just about data formats, it's about the overall makeup of your infrastructure and development processes. If an org builds around the particular way PlanetScale manages DB branches and other minutiae of their service, they can't simply replace that with an in-house DB server overnight.


True enough, but by that standard, every service would constitute lock-in. A cleaning company isn't lock-in just because you don't have in-house janitors.

"Vendor lock-in" means that you're tied at a business-critical level to some proprietary technology _that you can't get anywhere else_.

If you rent Linux servers, Postgres databases, or even k8s setups from Azure, there's no shortage of alternative vendors willing to sell you compatible product should you tire of MS, and ideally you won't need to change much more than a few endpoints. However, if your entire user base lives in Azure AD, it's a very different story.


Sounds interesting but there is not enough information for me to get a picture of what this actually means. Thought I was gonna be clever and install it, run it and see what it is all about but I can't figure out how to get it. One link is "sign up" and the other is "contact sales" but I'm not interested in either, I'm interested in "Download" but I cannot find it anywhere.

Am I stupid or misunderstanding something fundamentally here? Is just a hosted DB or something like that?


It's a hosted DB, and as far as I can tell the Killer Feature is that it makes schema updates less painful. How it does that without significant performance trade-offs or caps is unclear to me.

[EDIT] on reading further in their docs, my suspicion is that their "branching" concept is a hell of a lot more limited than I believed at first. I initially took it to mean you could have multiple active schemas working on your data at once—instead, I think it's more like exporting just the schema of your DB and importing it to a fresh DB, which is nothing new and doesn't run into all kinds of operational and security issues the other workflow would. I'm fairly sure all the actual magic is in the schema diffing, and the docs make me think even that isn't as fully-magical as one might hope.


It's closer to your original assumption than your second. Branching relies on vreplication, and it does actually allow you to develop multiple versions of your schema against your full production dataset. The "magic" is a powerful materialization engine that allows every branch to maintain its own version, and it _does_ allow you to work with your actual data. This is just scratching the surface of what that can do, though. We have loads more features cooking. :)


Really great finally seeing a “serverless” SQL database based on MySQL and Vitess that scales from 0 to n, even with a free tier!

Where are you hosted? What latency should we expect from AWS, GCP, DigitalOcean and fly.io?

And also what degree of compatibility should we expect with MySQL? The doc is quite sparse on this.


Vitess' compatibility with MySQL has made major leaps in the past couple of versions and the team has started focusing on locking in ongoing compatibility with various popular development frameworks. You can find those here, and more are getting added regularly: https://github.com/planetscale/vitess-framework-testing/

The basics of MySQL compatibility are described in here, though it's important to keep in mind that just because something "works" doesn't always mean it's the best way to do things in a sharded environment: https://vitess.io/docs/reference/compatibility/mysql-compati...


Thanks! Not having window functions and CTEs is a significant limitation for any kind of data analysis. But I guess the main use case is pure OLTP where it is – a bit – less relevant.



It sounds like the database branch has copy of the production schema, but what about the data itself? We've been using AWS Aurora clones to quickly give developers copies of production, can zero-copy clones be made as part of the branch?


Wondering will we see something like this for Postgres?

I like the cool things but I can’t migrate to MySQL just because of this.


I've been following Citus Data, it might be interesting for you. https://www.citusdata.com/


Ed: obviously this does not solve the bigger service of providing "magic" scale up/down, though.

It looks like the "branching schema" is essentially:

CREATE DATABASE branch_name WITH TEMPLATE main;

I might actually have to try this - I hadn't really thought template dbs would be all that useful in pg - but this branch and test för dev use-case is interesting.

Then there's COPY (for data) and rename to "promote" a branch with data (one would probably want to run DDL on main db though:

ALTER DATABASE branch_name RENAME TO main; -- would have to move old main out of the way first - but might be possible in same transaction?


Sure you can! and we will make it worth it.


I can't decide whether this joke is marketing folly or strategic genius.

On the one hand, making light of how big an undertaking a _database engine migration_ would be makes you come off sounding like the "mongodb is web scale" guy. That's a pretty terrible look for a company proposing to take on critical production infrastructure.

On the other hand, adding postgresql support to vitess is probably a _massive_ undertaking. Sure, it might make sense for you to contribute to in the long run to open up that customer segment, but at your current stage postgresql customers are probably more of a product-distraction than anything. In that light, driving us away for now is probably the best policy.

I'm gonna give you the benefit of the doubt. Well played, sir. Well played.

EDIT: nevermind, all your other comments so far in this thread are variations on "Yes, we can do that!". You're promising the moon, and that's fishy.

Bit of unsolicited advice (you know what they say about unsolicited advice, but here goes...): You'd be better off admitting some weaknesses and discussing tradeoffs. For examples, take a look at other HN threads where execs have hopped on. What qualities do you seen in posts that get the most positive responses? Now compare that to the confusion and skepticism in this thread. Your messaging has landed off-target.

You're proposing to take over a piece of people's critical production infrastructure. As a potential buyer considering an infrastructure provider, seeing "buy our product and we'll solve all your problems" just makes me skeptical. It makes you sound like a used car salesman. Your landing page delivers that basic message, and your comments in this thread reinforce the resulting impression.

Conversely, if in these comments you soberly acknowledged your limitations, transitioned to a detailed treatment about tradeoffs, and _then_ used that save to toot your horn some more, that would give the impression you've considered this problem deeply, help me figure out how closely your value prop is aligned to my goals, and make me more inclined to trust you, your company, and your product.

As it is...

I see your comment and it makes me think your company isn't ready to take on my infrastructure (this is my database we're talking about here... if you're not taking something as basic as migration costs seriously, what other nasty surprises are waiting for me down the line?).

Then, I take a look at your about page (maybe this guy's just an entry-level marketer in the people-pleaser phase), and I see that you, Sam Lambert, are the Chief Product Officer. That gives me some serious doubts about the future viability of your company, because now I'm worried the CPO is fundamentally out of touch with the userbase and can't grapple with difficult details.

You will not be touching my data any time soon, but I'll be watching for changes, and I honestly wish you luck. I like your vision, but your messaging execution needs work and your product execution is yet-unproven.

EDIT: kudos to lizztheblizz for giving more detailed answers that inspire confidence. You've significantly repaired my impressions of your company.


We are definitely not out of touch with our user base. The reason I feel a high degree of confidence is that we currently have a major Postgres user moving to PlanetScale at the moment and they are loving the experience. This is v1 of our product. I have not seen a database product like this, that cares about the daily lives of the users that use it. I believe we can create a fundamental shift in how databases think about their users.

We concern ourselves with the productivity of our users. We also solve hard problems at scale, for many large companies. We have so many exciting developer focussed features that I think people will love, all the while being on a viable database. Vitess is by far the highest scale open source database solution out there.

Migration cost is a serious thing. We believe the immense leap in productivity you will gain over time from our product makes it more than worth it.


Despite the partial retraction this is great advice, hope I’ll get that level of quality feedback on a projects.


looks like it's a ways off... planetscale says they're built on vitess, and vitess has postgres compatibility at the bottom of their "Medium Term" roadmap: https://vitess.io/docs/resources/roadmap/


It has been there for two plus years with very little news and development. I wouldn't expect it to be done in another 3-4 years.


With solutions like CockroachDB, YugabyteDB etc. gaining steam do things like Vitess have a place long term?


If I understand correctly, PlanetScale = hosted Vitess? I assume it was started by the creators/maintainers of the project (similar to Confluent and such)?


Yes!


And if your company fails my company fails with it? How would I ever move out of this?

Very impressive tech! But the upside is not worth the vendor risk for me.

Edit: or maybe this is not an issue at all - but the website copy does not make it clear, so take it as a suggestion please.


The pricing gives me anxiety.

$1.25/mo per 10GB storage

$15/mo per 100 Million rows read

$15/mo per 10 Million rows written

But I won't lie, super excited to give this a try.


It's not entirely clear from the website or the documentation, but this seems to be build exclusively for MySQL?


Yes we are MySQL compatible.


100% compatible? Is there a document with a compatibility matrix? Edit: I guess this is it: https://vitess.io/docs/reference/compatibility/mysql-compati....


At 100,000 writes per second (which is roughly what the last data-driven system I worked on did), this would be $400K per month.

It seems to be the case with all the NewSQL databases that they're still just not economical to run unless you are FAANG. Get a Bigtable / HBase / etc and build a custom system and save $50-350K per month. For a sub 100 person company, it's a no-brainer.


The amount of buzzwords on the site made me think this was some kind of elaborate joke. I'm not yet convinced it isn't...


This is wonderful. I've been developing with relational databases for 12 years and couldn't be more excited about this.

Scaling down to 0 opens up a world of opportunities when it comes to team development workflows. The idea of quick environments per pull request is within reach.

Not to mention that it scales with you.

The one thing I'm curious about is how it compares with CockroachDB


12+ years of massive scale production use for Vitess and 26+ years of hardening for MySQL and InnoDB. PlanetScale adds some (imho) great features on top of that, but it's standing on the shoulders of giants that have proven themselves over and over.

... Also, I guess it's MySQL-compatible rather than Postgres-compatible? :)


Quick environment per pull request is available with SQLite now :)


Railsers, if you're perplexed by what's going on here this Rails specific documentation helped me: https://docs.planetscale.com/tutorial/connect-rails-app


is it not funny that something with a word "scale" in it disabled creation of databases because of being under load?


I'm currently searching for a serverless database for my next project and will look into it.

I already played around with Upstash and Fauna.

Somehow the signup shows an 422 error, then I get an confirmation email that leads to a blank page.


What is a serverless database and why are you searching for one for your next project?


A managed database thats automatically scales horizontally and is priced on-demand.

I want it for my next project, because I don't want to pay for what I don't use, but also don't want to provision or maintain additional instances manually.


Explanation at the exact level of detail I wanted - thank you :)


I heard YouTube migrated away from Vitess to Spanner. Does anyone know why?


Aren't schema changes are actually a big problem when there is a huge amount of data in the table. Adding a column / index to a table that has 500M rows in it is usually the pita. This is definitely something on the right direction, but would love to see this supporting this with the data itself.


That is exactly what we _do_ support. Our team includes the original developer of gh-ost for MySQL, which has been built to execute these kinds of massive changes at scale at GitHub. We've integrated it tightly with Vitess. Once your branch is ready to be merged with production, it executes that change in a completely non-blocking way.


Ok, but most of the time you would like to test that change on a copy or almost up to date replica, to see measure things like how long it does take. If you were copying the data, with things like PII filtering, to the branch development database and allow folks to test it there it would be even more amazing. Great start tho!


Give it a shot, because the current implementation specifically side steps the need for that full copy, and does let you test functionally against fully up-to-date production data.

You're right, though, it doesn't cover all of those uses cases. Luckily, Vitess does offer most of that out of the box already. Just need them exposed through the PlanetScale UI. :)


What do yall see as the next devx improvements beyond branching and no-fear schema migrations? I'd be really interested in how you might apply the workflows you're defining towards (IMO) bigger problems like data migration and backfills.



What are the geographical locations - can I choose which data center? What kind of redundancy or backups are there?

Does a full table scan count as a read? Is there planetscale specific tooling for managing indexes or primary keys?


Can serverless solutions e.g. AWS Lamba or Google Cloud Run connect to this?


Yes absolutely!


Where is it hosted? Do you have instances in all data centers moving shards to where they are used?

Using databases outside the same availability zone can be slower...


What is this? Reading this post does not explain it, but I figure it some sort of new DB.


How does branching work ? Is it provided by Vitess, or did you build it on top of it ?


I cannot find anything on the website about matters like uptime and backups?


It looks great, congrats


Appreciate that. thank you


If you look closely, its logo look like a planet is dabbing.


will this support blob as well like S3?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: