I think that's the core innovation here, smart HTTP block storage. I wonder if t...

westurner · on July 31, 2021

> Methods for remotely accessing/paging data in from a client when a complete download of the dataset is unnecessary:

> - Query e.g. parquet on e.g. GitHub with DuckDB: duckdb/test_parquet_remote.test https://github.com/duckdb/duckdb/blob/6c7c9805fdf1604039ebed...

> - Query sqlite on e.g. GitHub with SQLite: [Hosting SQLite databases on Github Pages - (or any static file hoster) - phiresky's blog](...)

>> The above query should do 10-20 GET requests, fetching a total of 130 - 270KiB, depending on if you ran the above demos as well. Note that it only has to do 20 requests and not 270 (as would be expected when fetching 270 KiB with 1 KiB at a time). That’s because I implemented a pre-fetching system that tries to detect access patterns through three separate virtual read heads and exponentially increases the request size for sequential reads. This means that index scans or table scans reading more than a few KiB of data will only cause a number of requests that is logarithmic in the total byte length of the scan. You can see the effect of this by looking at the “Access pattern” column in the page read log above.

> - bittorrent/sqltorrent https://github.com/bittorrent/sqltorrent

>> Sqltorrent is a custom VFS for sqlite which allows applications to query an sqlite database contained within a torrent. Queries can be processed immediately after the database has been opened, even though the database file is still being downloaded. Pieces of the file which are required to complete a query are prioritized so that queries complete reasonably quickly even if only a small fraction of the whole database has been downloaded.

>> […] Creating torrents: Sqltorrent currently only supports torrents containing a single sqlite database file. For efficiency the piece size of the torrent should be kept fairly small, around 32KB. It is also recommended to set the page size equal to the piece size when creating the sqlite database

Would BitTorrent be faster over HTTP/3 (UDP) or is that already a thing for web seeding?

> - https://web.dev/file-system-access/

> The File System Access API: simplifying access to local files: The File System Access API allows web apps to read or save changes directly to files and folders on the user’s device

Hadn't seen wilsonzlin/edgesearch, thx:

> Serverless full-text search with Cloudflare Workers, WebAssembly, and Roaring Bitmaps https://github.com/wilsonzlin/edgesearch

>> How it works: Edgesearch builds a reverse index by mapping terms to a compressed bit set (using Roaring Bitmaps) of IDs of documents containing the term, and creates a custom worker script and data to upload to Cloudflare Workers

cafxx · on Aug 2, 2021

> Would BitTorrent be faster over HTTP/3 (UDP) or is that already a thing for web seeding?

The BT protocol itself runs on both TCP and UDP, but it has preferred the UDP variant for many years already.

westurner · on Aug 4, 2021

Thanks. There likely are relative advantages to HTTP/3 QUIC. Here's this from Wikipedia:

> Both HTTP/1.1 and HTTP/2 use TCP as their transport. HTTP/3 uses QUIC, a transport layer network protocol which uses user space congestion control over the User Datagram Protocol (UDP). The switch to QUIC aims to fix a major problem of HTTP/2 called "head-of-line blocking": because the parallel nature of HTTP/2's multiplexing is not visible to TCP's loss recovery mechanisms, a lost or reordered packet causes all active transactions to experience a stall regardless of whether that transaction was impacted by the lost packet. Because QUIC provides native multiplexing, lost packets only impact the streams where data has been lost.

And HTTP Pipelining / Multiplexing isn't specified by just UDP or QUIC:

> HTTP/1.1 specification requires servers to respond to pipelined requests correctly, sending back non-pipelined but valid responses even if server does not support HTTP pipelining. Despite this requirement, many legacy HTTP/1.1 servers do not support pipelining correctly, forcing most HTTP clients to not use HTTP pipelining in practice.

> Time diagram of non-pipelined vs. pipelined connection The technique was superseded by multiplexing via HTTP/2,[2] which is supported by most modern browsers.[3]

> In HTTP/3, the multiplexing is accomplished through the new underlying QUIC transport protocol, which replaces TCP. This further reduces loading time, as there is no head-of-line blocking anymore https://en.wikipedia.org/wiki/HTTP_pipelining