The original data we used was not annotated with sources, only where the overall data came from. Most was news articles. The length doesn't seem to matter too much as we see a lot of errors even when summarizing a single sentence (sometimes the model felt compelled to elaborate more info). Usually the hallucinations were common sense inferences, such as assuming the plant was a cannabis plan in the example listed in the NYT article. Other times the LLM would invert things. Eg. if you ask any of the google LLMs to summarize an article about a famous boxer, where the article stated that Wahlberg was a fan of said boxer, the Palm models would flip it to say the boxer was a fan of Wahlbergs. Even the latest Bard model still does that, I tested it this weekend. It's a subtle and small error. But it's still factually incorrect.