You should follow a two-pronged approach, using the algorithm to do a first pass and humans for cleanup.
1) You should do an algorithmic prescreen via hierarchical clustering to lower the number of items that need human assessment, with a confidence rating on each entity. For any elements with a sufficiently low confidence, you should push it to us. This will cut down the number of tasks you need to judge further.
2) If you can produce a ruleset for a human to follow in making categorization decisions, you can do this with MobileWorks for much less than the cost of MTurk -- probably a penny each. Since workers are incentivized properly, the cost per item is less. You can contact me directly (anand@mobileworks).
1) You should do an algorithmic prescreen via hierarchical clustering to lower the number of items that need human assessment, with a confidence rating on each entity. For any elements with a sufficiently low confidence, you should push it to us. This will cut down the number of tasks you need to judge further.
2) If you can produce a ruleset for a human to follow in making categorization decisions, you can do this with MobileWorks for much less than the cost of MTurk -- probably a penny each. Since workers are incentivized properly, the cost per item is less. You can contact me directly (anand@mobileworks).