This would be easy to patch the models to fix. Just gather a small amount of training data for these cases, eg. "change the clock hands to 5:30" with the corresponding edit.
Three tuple: (original image, text edit instruction, final image).
Easy to patch for editing models, anyway. Maybe not text to image models.
Three tuple: (original image, text edit instruction, final image).
Easy to patch for editing models, anyway. Maybe not text to image models.