Basically, it retains all revisions. Furthermore, the main document index is a keyed trie, much like hash array mapped tries. That is storing an array as a tree and using compact page layouts (bitmaps, 4 references pages, full pages) to reduce the page sizes if they are not full. However, Sirix assigns monotonically increasing, immutable, unique node identifiers, thus most inner pages are full with references to the next level pages (also checksums of the child pages are stored along with the references as in ZFS). The height of the tree increases dynamically. Currently every inner page stores at most 1024 references, thus it's a very wide tree, but we should experiment with other sizes.
The leaf pages of the trie store either the data itself/the nodes or nodes of the path summary, nodes of the secondary indexes...
Thus, we have the main document index, but a RevisionRootPage also has references to the tries, which store the secondary indexes. The secondary indexes are read into main memory / are reconstructed from the leaf pages of the tries (usually small), also a small path summary.
The data pages are not simply copied... only nodes, which changed or fall out of a sliding window. Thus, a page may have to be reconstructed in-memory from at most a small number N of page fragments in the worst case. Thus, it needs a device, which is suitable for fast random, small sized parallel reads and sequential writes.
Currently you have to copy a resource starting from a given revision and applying all updates up to the most recent revision with intermediate commits in order to get rid of old revisions, as it only uses one data file per resource (a resource is equivalent to a table in a relational system). Thus, the data files are basically logs. Another file simply stores offsets and timestamps read into memory to retrieve a given revision.
The leaf pages of the trie store either the data itself/the nodes or nodes of the path summary, nodes of the secondary indexes...
Thus, we have the main document index, but a RevisionRootPage also has references to the tries, which store the secondary indexes. The secondary indexes are read into main memory / are reconstructed from the leaf pages of the tries (usually small), also a small path summary.
The data pages are not simply copied... only nodes, which changed or fall out of a sliding window. Thus, a page may have to be reconstructed in-memory from at most a small number N of page fragments in the worst case. Thus, it needs a device, which is suitable for fast random, small sized parallel reads and sequential writes.
Currently you have to copy a resource starting from a given revision and applying all updates up to the most recent revision with intermediate commits in order to get rid of old revisions, as it only uses one data file per resource (a resource is equivalent to a table in a relational system). Thus, the data files are basically logs. Another file simply stores offsets and timestamps read into memory to retrieve a given revision.
https://sirix.io/docs/concepts.html
and
https://sirix.io/docs/jsoniq-tutorial.html
Should probably help to get a further understanding.
HTH and let me know if you're interested in more details :-)
Thanks for asking