I would have liked to know what the Kolmogorov approach has to say about the examples used to deflate frequentism ("which of my friends will start a business?"). I don't see from the article how the "smallest-program" approach could say anything useful about those, either. Maybe that wasn't the point--but after poking holes in the frequentist view, it uses unrelated examples like digits of pi to illustrate the Kolmogorov idea, so I'm left unable to directly compare the kinds of statements the two approaches can make.
Even going back to dice or coins would have helped me compare them. Like, I know frequentists can show how the variance in coin-toss outcomes decreases as the sample size increases. What can the smallest-program approach say about that? Or was the point that those variant outcomes aren't "real" enough to talk about? Does that mean there is a connection to constructivism in mathematics here?
It seems either approach benefits from more data, and there must be a concept related to a "confidence interval" where, as 100, then 200, then 300 digits of pi roll in, your pi-program stays the same size while other programs have to keep growing to accommodate the new data. Like, the ratio of the smallest program to the naive encoding ought to say something about how potentially predictive the small program is.
Thanks for the interesting article. It definitely made me think about the issues, and now I'm curious to know more about the topic.
One aspect of Kolmogorov approach is that implicitly models things like biased probabilities through compressibility.
The shortest representation of rolls of fair dice or coins is their exact results, but if there's "less randomness" in some way (biased coin/die, sum of two dice which means non-uniform probabilities, combination of some predictable pattern with random noise) then there are more compact representations of that information, and all of that gets captured by the Kolmogorov approach without any explicit handling of the various possibilities.
Even going back to dice or coins would have helped me compare them. Like, I know frequentists can show how the variance in coin-toss outcomes decreases as the sample size increases. What can the smallest-program approach say about that? Or was the point that those variant outcomes aren't "real" enough to talk about? Does that mean there is a connection to constructivism in mathematics here?
It seems either approach benefits from more data, and there must be a concept related to a "confidence interval" where, as 100, then 200, then 300 digits of pi roll in, your pi-program stays the same size while other programs have to keep growing to accommodate the new data. Like, the ratio of the smallest program to the naive encoding ought to say something about how potentially predictive the small program is.
Thanks for the interesting article. It definitely made me think about the issues, and now I'm curious to know more about the topic.