This is not strictly a lightning question. I hope that’s all right.
I do not understand all the excitement about using, or creating, synthetic training data. Yes, I get it, in some cases there may not be enough ‘real’ data to train on. But this trend is not so limited. Given the hallucinations and the leaking of training data, using synthetic data just seems to me like begging for far worse problems in production than a rude response from a model. Why is no one I’ve read on this subject concerned about these things? Thx.