A smaller model has less capacity and thus is less prone to overfitting, which is a reason of hallucination. Overfitting means that a model cannot adjust its output based on the input that is unseen during training.
Yes. Just because the model is smaller doesn't always mean by default it's worse, as they may be trained for less time or on less data, which in some cases could be beneficial. The differences are small so may not be statistically significant. Plus the model is doing the evaluation, so while it's highly correlated with humans, a small difference like this may not mean that the the 7B model is necessarily better.