One is the extremely sprawling MarginaliaSearch repo[M1].
Here it did a decent job of capturing the architecture, though it is to be fair well documented in the repo itself. It successfully identifies the most important components, which is also good.
But when describing the components, it only really succeeds where the components themselves are very self-contained and easy to grok. It did a decent job with e.g. the buffer pool[M2], but even then fails to define some concepts that would have made it easier to follow, e.g. what is a pin count in buffer management? This is standard terminology and something the model should know.
I get the impression it lifts a lot of its fact from the comments and documentation that already exists, which may lead it to propagate outdated falsehoods about the code.
The other is the SlopData[S1] repo, which contains a small library for columnar data serialization.
This one I wasn't very impressed with. It produced more documentation than was necessary, mostly amending what was already there with incorrect statements it seems to have pulled out of its posterior[2][3].
The library is very low-abstraction, and there simply isn't a lot of architecture to diagram, but the model seems to insist that there must be a lot of architecture and then produces excessive diagrams as a result.
So overall it gives me a bit of a broken clock vibe. When it's right, it's great. When it isn't, it's not very useful. Good at the stuff that is already easy, borderline useless for the stuff that isn't.
One is the extremely sprawling MarginaliaSearch repo[M1].
Here it did a decent job of capturing the architecture, though it is to be fair well documented in the repo itself. It successfully identifies the most important components, which is also good.
But when describing the components, it only really succeeds where the components themselves are very self-contained and easy to grok. It did a decent job with e.g. the buffer pool[M2], but even then fails to define some concepts that would have made it easier to follow, e.g. what is a pin count in buffer management? This is standard terminology and something the model should know.
I get the impression it lifts a lot of its fact from the comments and documentation that already exists, which may lead it to propagate outdated falsehoods about the code.
[M1] https://deepwiki.com/MarginaliaSearch/MarginaliaSearch
[M2] https://deepwiki.com/MarginaliaSearch/MarginaliaSearch/5.2-b...
The other is the SlopData[S1] repo, which contains a small library for columnar data serialization.
This one I wasn't very impressed with. It produced more documentation than was necessary, mostly amending what was already there with incorrect statements it seems to have pulled out of its posterior[2][3].
The library is very low-abstraction, and there simply isn't a lot of architecture to diagram, but the model seems to insist that there must be a lot of architecture and then produces excessive diagrams as a result.
[S1] https://deepwiki.com/MarginaliaSearch/SlopData
[S2] https://deepwiki.com/MarginaliaSearch/SlopData#storage-types (performance numbers are completely invented, in practice reading compressed data is typically faster than plain data)
[S3] https://deepwiki.com/MarginaliaSearch/SlopData/6.3-zip-packa... (the overview section is false, all these tables are immutable).
So overall it gives me a bit of a broken clock vibe. When it's right, it's great. When it isn't, it's not very useful. Good at the stuff that is already easy, borderline useless for the stuff that isn't.