Hacker Newsnew | past | comments | ask | show | jobs | submit | foldU's commentslogin

I don't think a post on a VC blog of all places should be considered "journalism"


You're nitpicking the term "journalism" here, but even in casual conversation if a friend went on and on about how great a company was and then I later found out that they:

* Got paid a referral fee if I signed up

* Owned shares in the company

* Even was roommates with the founders/CEO but failed to mention it

I would trust that person less going forward.

If you want to be perceived as trustworthy, then you shouldn't say things that you have a hidden interest in saying.


To be fair, the top of the post says "Portfolio Spotlight".


I don’t mean to suggest it’s not sleazy—I think it’s sleazy by default. VC writing is not to be trusted.


Joran from TigerBeetle!

Our investors (Spark, Amplify, Coil) are different to most, in that our partners are all highly technical.

Engineers, coders, CTOs who read the same research papers and attend the same technical conferences (CIDR, VLDB, SIGMOD, HYTRADBOI etc.).

In fact, that's how we met.


This is correct, I appreciate you for putting it so coherently :). I think I didn’t make it clear enough in the piece that I’m coming from a stance of fast access being table stakes, and the question being about how that’s accomplished.


"Caching" is an idea of storing a result of an expensive computation in storage that is faster to get from than doing the original computation (in very generic computer terms, computation can be simply fetching from the network or slower local storage).

What you describe as "caching algorithms" are not really caching algorithms, but cached object lifetime management algorithms (LRU, LFU...).

"Abstraction" is a higher level, simplified view of a set of concepts, yet caching is a single concept. See eg. https://en.wikipedia.org/wiki/Abstraction_(computer_science)

It sounds like you are both trying to redefine what "caching" means (tying it to implementations of particular algorithms), but also what "abstraction" means.

We should be very deliberate with the language we use, and our main goal should be to make it simpler to understand, not harder — I believe you are doing the latter here.


he addresses this, you simply get airline pilots to pick them


Yes, it’s good


whoops! thank you!


(author of OP) That post of yours was actually what got me tooling around with this stuff again :) it's a really excellent one


Thanks! That was very kind of you to say. Whenever I write stuff like that, I wonder, "Does anyone find this useful?" It helps to hear every once in a while that the answer is sometimes yes.


You also don't really have to worry about convergence, since these are formal power series.


More deserves to be written on the ILP idea, I haven't actually tried to make it work but it seems to me like the only real direction to optimize ("optimize" used in the mathematical sense, rather than the programming sense) queries that you can't just exploit the principle of optimality on (see this earlier article I wrote [1] for a bit more exposition). I think maybe some of the egg [2] people have experimented with this.

Speaking from experience, there's lots of rewrites you'd want to do in a query optimizer that having access to efficient DAG-shaped query plans would make tenable, but as a specific example they are an important part of doing full subquery decorrelation [3].

[1] https://buttondown.email/jaffray/archive/why-are-query-plans...

[2] https://egraphs-good.github.io/

[3] https://www.scattered-thoughts.net/writing/materialize-decor...


Also just in general the idea of at least being worst case optimal (polynomial faster multi way join-project w.r.t. input rows, assuming fixed query (but those polynomials are given by concepts like the "fractional hypertree width" of a query) but pathologically connected/connecting input rows to the join):

the simplest example is a simple two-column table of a directed graph's edges, e(from, to). Then: Triangle(a, b, c) := e(a, b), e(b, c), e(c, a).

#Triangle <= c * (#e)^1.5, for a small c. (I think c = 3 in this case, but I'm not sure.)

Using binary joins for pathological input will however take O((#e)^2). If you do row-granularity lookup using an auxiliary index (actually 2 in the simplified way, but they can be merged by pushing the post-lookup choice down to index creation) along with indexing e twice (once for each of the two columns), you can do it in O((#e)^1.5):

Create auxiliary indices eaux1 and eaux2: eaux1(from, (#e(from, to))). eaux2(to, (#e(from, to))).

Iterate over e(a, b): Look up eaux1(b, ?countbc) and eaux2(a, ? countca). If countbc < countca, then: expand through point lookup e(b, ?c). Filter those resulting e(?a, b), e(b, ?c) candidates using a point membership query suitable index of e(c, a). else: expand through point lookup e(?c, a). Filter those resulting e(a, ?b), e(?c, a) candidates using a point membership query suitable index of e(b, c).

Done.

As can be seen, the trick is to select between the current options for "next table to join current partial row against" at each nesting level of the generally nested-loop-join structured query execution, using premade indices that reveal the iteration count of the upcoming inner nested loop in O(1) time.


(I'm the author of the original post [1]) And while I think the rules, in the end, make sense, I think it's not quite as clear cut as you describe, consider:

    SELECT (SELECT sum(1) FROM xx LIMIT 1) FROM aa;
which returns

    3
    3
    3
Personally, I think having the inner aggregation always attach to the nearest `SELECT` would have been an equally valid way of defining how this works, but it just so happens it is not defined that way.

[1]: https://buttondown.email/jaffray/archive/sql-scoping-is-surp...


Well, the "fun" one to consider is IMO not that, but this:

    SELECT (SELECT sum(a + x) FROM xx LIMIT 1) FROM aa;
Making a pretty diagram where all the behaviours from various DBMSs are listed is left as an exercise to the author. :)


I think this one is obvious actually! There's no choice but to aggregate it at the level of the `xx` `SELECT`, no other level has access to the `x` column.


> Making a pretty diagram where all the behaviours from various DBMSs are listed is left as an exercise to the author. :)

It's there already!

It's in the chart as footnote "b": Outer reference must be the only argument (doesn’t support F441)

That's the one thing SQL Server doesn't eat. Those that are green in the chart work fine in this case.


I agree, I think the original sin here is the fact that whether a `SELECT` is an aggregation is determined by the contents of the scalar expressions at all. I think most of this weirdness comes directly out of wanting to be able to write both `SELECT sum(x) FROM xx` and `SELECT x FROM xx` and have them work.

Not that I have a better solution offhand, in SQL grouping by a constant value is not actually the same as not writing `GROUP BY` at all since the behaviour on empty tables is different.


What's an aggregation per se? A SQL query is best thought of an arbitrary generator function. You can, using functions like UNNEST, end up emitting multiple row-tuples for each processed input row-tuple. An aggregation is just a generator function that happens to reduce over all the input rows and then emit one row-tuple when there's no more input. Query-planning engines do not special-case this. It's just a generator node like any other generator node.

Consider: using window functions, you can do partial aggregations over subsets of the input — without even necessarily partitioning the input (i.e. you can compute "running totals" and other wacky state-machine-like outputs.)


Not writing `group by` is the same as writing `group by ()`.

And yeah, the difference between that and a value is one of those really surprising things on SQL that actually make sense and should be this way. Unfortunately, there are many of those.


Ah my apologies, I wasn't familiar with that syntax.


No need to apologize. My post didn't make that clear at all.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: