Declaring that somebody is falsely claiming to use ML without looking at their product is a bit irresponsible ;)
The positivity, subjectivity, and politeness calculations are all outputs from neural networks, and the overall likelihood is calculated using a decision forest so that we can explain the results to people. There are plenty of emails with a high likelihood of getting a response, even though each calculation may score poorly.
> They may arrived at the guidelines using ML, but it's possible that their guidelines wouldn't be right for the types of emails you are sending out.
This is a great point, and it's something that users ought to consider with nearly every application of machine learning that ends with a definite recommendation to the user. Machine learning can be used to solve many many different types of problems - when it comes to solving problems related to human interaction, the insights that it has will tend to function more like the rules for running an effective business-focused popularity contest than the rules for crafting meaningful emails to every possible audience. That said, if you happen to be sending a business email and want nothing more than to improve the likelihood of response, this seems like a great tool for the job.
Sort of true, in the sense that the product isn't separately trained for sales emails vs personal emails vs internal business emails, etc.
But the calculations we chose don't provide a lot of constraints, and the variances were not as high as you'd likely expect. So I'd be comfortable saying that the recommendations generalize well to a vast majority of situations.
While the tool is definitely useful, I agree with your comment. Too often, tech that is quite simple is misrepresented as AI because it makes for better headlines / a larger investment round in the case of startups
as a newcomer to machine learning.. how does one build a training set of "tens of millions of emails" ? This is one of the things I have struggled with... its probably much harder to build a training set than the libraries and the algorithms themselves.
are there companies that build training sets for you... or something else ?
We have the Enron data set loaded into a VM ready to use if you use Python. It's on our blog somewhere. We also used the Jeb Bush data set and the Sony data set from wiki leaks. If you'd like, we can help you with setting them up. you can email me at moah@boomerangapp.com
Yes, that is one of the biggest challenges for any machine learning system to be effective. And the more specific the data set, the harder it is to collect (or get access to) and probably the more valuable.
Someone in this thread cites the Enron emails. Odds are if you're building system to analyze energy trading or (maybe) financial trading in general in American English, then that will work. The problem is that that's a small segment.
The main way to collect this data is to effectively do a Trojan Horse by building something else in a complementary area that is useful that lets you start collecting the data. Once you have it, machine learning is useful. This is what Boomerang has done for email and what Google has done with Google Voice.
They may arrived at the guidelines using ML, but it's possible that their guidelines wouldn't be right for the types of emails you are sending out.