A thorny problem in my case is that the same item is named in 3 different ways between the 3 supermarkets which makes it very hard and annoying to do a proper comparison.
I have built a similar system for myself, but since it's small scale I just have "groups" of similar items that I manually populate.
I have the additional problem that I want to compare products across France and Belgium (Dutch-speaking side) so there is no hope at all to group products automatically. My manual system allows me to put together say 250g and 500g packaging of the same butter, or of two of the butters that I like to buy, so I can always see easily which one I should get (it's often the 250g that's cheaper by weight these days).
Also the 42000 or so different packagings for Head and Shoulders shampoo. 250ml, 270ml, 285ml, 480ml, 500ml (285ml is usually cheapest). I'm pretty sure they do it on purpose so each store doesn't have to match price with the others because it's a "different product".
Absolutely! It’s made it difficult to implement some of the cross-retailer comparison features I would like to add. For my charts I’ve just manually selected some products, but I’ve also been trying to get a “good enough but not perfect” string comparison algorithm working.
would maintaining a map of products product_x[supermarket] with 2-3 values work? I don't suspect that supermarkets would be very keen to change the name (but they might play other dirty games)
I am thinking of doing the same thing for linux packages in debian and fedora
My approach so far has been to first extract the brand names (which are also not written the same way for some fcking reason!), update the strings, and then compare the remaining.
If they have a high similarity (e.g. >95%) then they could be automatically merged, and then anything between 75%-95% can be reviewed manually.
A thorny problem in my case is that the same item is named in 3 different ways between the 3 supermarkets which makes it very hard and annoying to do a proper comparison.
Did you have a similar problem?