Oh, absolutely, they don't really know what internal cognition generated the scratchpad (and subsequent output that was trained on). But we _do_ know that the model's outputs were _well-predicted by the hypothesis they were testing_, and incidentally the scratchpad also supports that interpretation. You could start coming up with reasons why the model's external behavior looks like exploration hacking but is in fact driven by completely different internal cognition, and just accidentally has the happy side-effect of performing exploration hacking, but it's really suspicious that such internal cognition caused that kind of behavior to be expressed in a situation where theory predicted you might see exploration hacking in sufficiently capable and situationally-aware models.