Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your definition is one, but the one the OP is using is overfitting to training data.


That’s exactly my point: by that definition any incorrect answer can be explained by “overfitting to training data”.

Where do you draw the line between “overfitting to training data” and “incorrect data” ?


> That’s exactly my point: by that definition any incorrect answer can be explained by “overfitting to training data”.

Not really, getting 94381294*123=... wrong, but close within the actual answer, cannot be overfitting since it wasn't in the training data.


> [By] that definition any incorrect answer can be explained by “overfitting to training data”.

No it doesn't, for instance some errors would be caused by under fitting. The data could also be correct but your hyperparameters (such as the learning rate or dropout rate) could cause your model to overfit.

> Where do you draw the line between “overfitting to training data” and “incorrect data” ?

There's no need to draw a line between two explanations that aren't mutually exclusive. They can (as in this case) both be true. Overfitting is the symptom; dirty data is the cause.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: