Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I will often scope google searches for product reviews or food/recipe/diet/fitness knowledge to reddit. This is not the sole research I do but it does tend to be pretty high quality data.

I would not be surprised if Reddit is an ML/AI dataset gold mine.



Reddit's public-facing data is very useful for ML/AI.

On the data science end, you can do a lot to gauge important topics and user behavior: https://minimaxir.com/2018/09/modeling-link-aggregators/

On the silly AI-end, I made a subreddit consisting only of text-generating RNNs: https://www.reddit.com/r/SubredditNN/

Reddit's internal data is even more robust.


>Reddit's public-facing data is very useful for ML/AI.

assuming that the users generating the data are humans and not bots....


I take the same approach. Compared to something like amazon reviews the information tends to be more reliable.

I'm sure there are plenty of paid shills but my sense is that most people who post on reddit aren't trying to sell you something.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: