Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This may be a bit elementary for this crowd but regarding the balance of data cost vs capturing the most significant features. We use a simple decision tree as a significance cluster and optimize data munging around these clusters.

On some levels it is anti-diversity but given real world constraints it has yielded the best results. Any thoughts or links regarding this topic would be appreciated.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: