Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's pretty terrible how poorly developers test their database queries. This looks like a great step in the right direction. I think how the ORM story in RegreSQL develops is crucial. The SQLAlchemy integration looks interesting, but at the same time super specific. There are a million ways to generate SQL statements and ORMs are just one of them. A question that comes to mind is how will you handle interactive transactions? I'd say most complexity in queries comes from the back-and-forth between database and server. Is that out of scope?

Would also be fun if you could support PGLite [0], that's what I've been using to write "unit" tests connected to a "real" database.

[0] https://pglite.dev/



My goto for this lately has been ephemeralpg [0] and pgTAP [1]. It’s been pretty great

[0] https://github.com/eradman/ephemeralpg [1] https://github.com/theory/pgtap


OP here - I do agree some of the problems that come with SQL/ORM queries are pretty horrendous and that's exactly where I would like RegreSQL going. For now I can't promise the particular direction, but comments like this are the reason why I pushed myself to release it and keep it beyond my own playground. Thank you!


Just found out about pglite, this library is insanely cool. You can even run Postgres right in the browser.


It's an area where people get conflicted between unit testing in isolation and integration testing. The compromise between those two approaches tend to be slow, flaky, and merely provide the illusion of code coverage which you shoot for with unit testing while not being quite realistic enough to push it towards the side of being a proper integration test. I've never liked tests like that and mostly consider them a typically gigantic waste of time. Code coverage is meaningless for integration tests.

I deal with a lot of complex querying logic with mostly Elasticsearch. My appproach is to either unit test or integration test and just skip everything in between. With queries what I care about is that they work under realistic scenarios against the same version of Elasticsearch that we use in production. Creating test fixtures is expensive. So tests for read only querying shares the same test fixtures. That speeds things up. I don't care about separately testing simple crud operations because most of my scenarios trigger plenty of those. A unit test for that has no value to me. Unit testing whether my API can send queries to some fake Elasticsearch, empty Elasticsearch, etc. has limited value to me. There's some limited value in knowing the queries I'm sending are syntactically correct. But you get that for free with a proper integration test not failing. The unit test is redundant if you have a proper integration test. And a unit test without a proper integration test provided little practical value.

What I actually do care about is all the complicated dashboard and other queries that make lot of assumptions about how data is structured, what fields are there, how they are indexed, whether they can be null, blank, or have invalid values, etc. work as intended. That kind of calls for an integration test. Anything trivial enough that a unit test would be good enough doesn't tend to need a lot of testing. Any scenario that touches enough of that stuff, kind of covers most of that.

I put a lot of effort in ensuring that my integration tests can run quickly, concurrently, and don't interact with each other (data randomization). That allows me to get away with not deleting a lot of data between tests and gets me a lot of realism for free because real users don't have an empty system completely to themselves. So having a lot of tests running against a busy system is integration testing gold. I have close to 300 full API integration tests running in around 30 seconds on my laptop. Close enough to unit testing performance that I run them many times per day.

The same approach applies to database testing. Probably more so because all the interesting bugs usually relate to constraints, transactionality, database locks, etc. If you have flaky tests because of that, it might actually be because your database layer has some issues with the notion of users not being polite enough to queue up one by one.

This is not for everyone, I realize. Do what works for you. I've butted heads with people over this more than a few times. But in my company (of which I'm the CTO), we unit test functions, small classes, regular expressions, parsing logic, etc. We integration test systems and APIs. Testing individual queries without the rest of the system is hard and pointless. Test the API that triggers the query. That 30 second performance for test runs is something I spent a lot of time on getting. It means we can do major changes without fear. If tests pass, our users should be fine.


I agree with all of this and it's what I tend to follow too.

One question, since you seem to have more experience with this approach.

> Test the API that triggers the query.

When you test the API, do you also check the state of the database?

Let's say there's an endpoint which creates an entity, and that operation involves setting a bunch of columns that aren't returned in the API.

When testing, do you check only the expected API response, or also the database state to be sure the operation was fully correct?

I currently do the latter, but this feels a bit iffy as the database internals are being leaked in an otherwise "blackbox" test.

If you learn towards the first option, how then do you verify all the unreturned database fields are correctly set?


> It's pretty terrible how poorly developers test their database queries.

Yes. This becomes especially obvious when you rewrite ORM garbage for something complicated, and are told that they can’t accept it, because they’re not sure how to test it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: