Common large datasets being inherently biased towards some ideas/concepts and away from others in ways that imply negative things is something that there's a LOT of literature about
That's not a very scientific stance. What would be far more informative is if we looked at the system prompt and confirm whether or not the bias was coming from it. From my experience when responses were exceptionally biased the source of the bias was my own prompts.
The OP is making a claim that an LLM assumes a meeting between two women is childcare. I've worked with LLMs enough to know that current gen LLMs wouldn't make that assumption by default. There is no way that whatever calendar related data that was used to train LLMs would include majority of sole-women 1:1s being childcare focused. That seems extremely unlikely.
Not to Let me google that for you... but there are a LOT of scientific papers that specifically analyse bias in LLM output and reference the datasets that they are trained on
Common large datasets being inherently biased towards some ideas/concepts and away from others in ways that imply negative things is something that there's a LOT of literature about