The methods we found for solving the first problem don’t appear to help with the second problem, and vice versa. In fact, the two isolated solutions appear quite difficult to reconcile. The problem that these two papers leave open is: Can we get one algorithm that satisfies both properties at once?
A few paragraphs up:
The first property is about recognizing patterns about logical relationships between claims — saying “claim A implies claim B, so my probability on B must be at least my probability on A.” By contrast, the second property is about recognizing frequency patterns between similar claims — saying “I lack the resources to tell whether this claim is true, but 90% of similar claims have been true, so the base rate is 90%” (where part of the problem is figuring out what counts as a “similar claim”).
My question would be, is the "similar claim" problem not a relationship between claims and therefore pertinent to the first property and providing a link between the two problems?
Yep, modern probability theory is dependent on being able to define a sample space that has an associated sigma algebra. The problem posed by the examples (eg “this Turing machine halts”) is that we have no obvious way to construct such a sample space (that is non-trivial) apriori. The innovation here is that there seem to be at least two ways to construct proxy sample spaces based on some of the surrounding constraints provided in the problem definition (since an many cases the sample space is binary, as the original article notes). The authors last question is whether these two approaches (that seem incompatible) can be used together.
I think it would really interesting if you could demonstrate that such 'external' probabilities could improve reasoning to better than chance because it would suggest that the answer to logic problems is partially constrained by their formulation, however I have this sense that it might also be the case that for some questions (eg the haling problem example above) no information would be shared between the question an its constraints.
I feel there might be a very interesting thought process going on in your second paragraph, but I cannot parse it. Would you mind explaining in a bit more detail on the link between articulation and answer? At face value, the formulation of a problem would definitely bear effect on the answer. I.e. what I ask determines the answer. And I would guess that information is naturally shared between a question and its constraints, given that constraints are constituents of the question. But I feel you are onto a different kind of link here.
Secondly, is there a reason why subjective Bayesian theory isn't mentioned? To me it seems obvious that expert elicitation and assigning probabilities to logical uncertainty is perfectly fine.
Speaking very roughly, fuzzy logic applies in cases where the degree of truth of a statement is in question. Logical uncertainty (about decidable sentences) applies in cases where the sentences are definitely true or false, but we lack the resources to figure out which.
So, for example, fuzzy logic might help you quantify to what extent someone is "tall," where tallness admits of degrees rather than being binary. Or it might help you quantify to what extent a proof is "long." But it won't tell you how to calculate the subjective probability that there exists a proof of some theorem that is no more than 500 characters long in some fixed language. For that, you either need to find a proof, exhaustively demonstrate that no proof exists, or find a way to reason under logical uncertainty; and we haven't found any ways to use fuzzy logic to make progress on formalizing inference under logical uncertainty.
I agree that fuzzy logic wouldn't work for that purpose. But it addresses a formalism around the foundation of what probabilities are, which to what I could see was something you guys were doing as well. Just a thought.
As for actually addressing logical uncertainty and asymptotic convergence, I think subjective Bayesianism can be used in both cases. For example you write "the axioms of probability theory force you to put probability either 0 or 1 on those statements", which I think is simply not true. If I as an "expert" claimed that "in my experience there is a 70% chance of conjecture being correct", I can set "Prior(conjecture)=0.7".
Was it not the case that people would have assigned very low probability to Fermat's last theorem too, at least before Wile's first version of proof? But then with Wile's first version, they would have assigned very large probability (1-epsilon, may be).
I am confused as to what benefits do we derive from assigning such probabilities? I am certainly missing something big here.
"However, it can’t represent reasoners’ logical uncertainty, their uncertainty about statements like “this Turing machine halts” or “the twin prime conjecture has a proof that is less than a gigabyte long.”"
The methods we found for solving the first problem don’t appear to help with the second problem, and vice versa. In fact, the two isolated solutions appear quite difficult to reconcile. The problem that these two papers leave open is: Can we get one algorithm that satisfies both properties at once?
A few paragraphs up:
The first property is about recognizing patterns about logical relationships between claims — saying “claim A implies claim B, so my probability on B must be at least my probability on A.” By contrast, the second property is about recognizing frequency patterns between similar claims — saying “I lack the resources to tell whether this claim is true, but 90% of similar claims have been true, so the base rate is 90%” (where part of the problem is figuring out what counts as a “similar claim”).
My question would be, is the "similar claim" problem not a relationship between claims and therefore pertinent to the first property and providing a link between the two problems?