<Your Siri requests —"Show me how to get to PF Chang's," or "What year was Steve Jobs born?" go back to Apple — but it uses a random identifier to mask your identity. So a Siri search for the closest Chipotle restaurant will only tell Apple that a user requested the data, but not associate it with me.>
I find that comforting. In fact, if I was a large organization and processed a ton of user data, I'd want to store that data anonymously too, due to the sheer risk of having that personal data.
And then you have users who always go to the same PF Changs but use navigation in case there is traffic and they should use an alternate route. Your competitor app will learn this and adapt and yours won't. The average customer will have no idea you aren't storing their data and likely doesn't even care while they aren't reading an article about privacy. While I personally wish everyone were to take your approach is also tying one arm behind your back.
Edit: I once worked with an ex-googler who told me to always store all the data you have because you never know what you can use it for, you can't get it back if you change your mind and storage is cheap. Hard to argue with this if it's "just" about competitive advantage and monetization.
> always store all the data you have [...] storage is cheap.
Here's the flip side: You can't lose data to a breach that you don't store. You can't have a rogue employee crow on social media about how they have access to data you don't store. You can't be liable for GDPR violations about PII you don't store.
Data is certainly an asset, but it's also a huge liability - and laws are starting to catch up in order to enforce how big of a liability it really is.
> Here's the flip side: You can't be lose data to a breach that you don't store.
That means NO storage, not even "anonymous". (which Apple clearly does)
> You can't have a rogue employee crow on social media about how they have access to data you don't store.
Requires NO storage, which they clearly do.
> You can't be liable for GDPR violations about PII you don't store.
If you store it "anonymous" you can, since the only requirement for it is to be personal data and there is zero change it can't lead to the person and anyone working with those 'unique' identifiers can tell you they most likely aren't that anonymous and the data can be used to trace a single person.
There's a difference between storing only the data required for conducting your business, and storing all the data you possibly can for some imaginary future use. I'm suggesting the former is a better practice.
> If you store it "anonymous" you can
Anonymity via random-but-unique IDs is a tenuous protection at best when storing everything you possibly can. Take, for example, the fact that with a gender, a zip code, and a birthday, you can be uniquely identified with around 85% accuracy [0]. None of those are traditionally considered to be PII, and it's pretty likely that these are the kinds of things stored "anonymously" by the "store it all for later" kinds of companies.
> And then you have users who always go to the same PF Changs but use navigation in case there is traffic and they should use an alternate route. Your competitor app will learn this and adapt and yours won't.
Your conclusion doesn't follow from your premise.
Once a user navigates to a specific PF Chang's from a specific location, the navigation app can suggest an alternate route that takes into account traffic, regardless of whether that user has a history or not.
In your scenario, the history of the user data would not affect the ability of the app to suggest routes that are less trafficked.
His point was to store that data because you might want it in the future - not that its necessarily useful now, in this situation.
(Ex. 2 years from now, when Apple gets into the restaurant business, they might want to analyze what users (user ids) search for what types/distances of restaurants.
Storing lots of data indefinitely is not cheap, it has a large fixed cost to develop and maintain the storage system, incremental storage is often cheap though.
For a company like Google that may have a reasonable need to store a lot of stuff (multiple versions of the web corpus, Gmail, drive), it may be cheap to also store search queries forever and who knows what else. For a company without an intrinsic need to store large data for long periods, it's not cheap to add.
Collecting information you don't plan to use and don't know how you will use is likely to mean when you do use it, you didn't collect it in a suitable fashion, so you may not be able to use it anyway. In the meantime it's a privacy liability with no value.
I find that comforting. In fact, if I was a large organization and processed a ton of user data, I'd want to store that data anonymously too, due to the sheer risk of having that personal data.