The name lens came from the property of their letting you "focus" onto parts of a larger data structure. The prism nomenclature is more tenuous - prisms evoke the imagery of a beam of light splitting into its constituent parts, and with a prism you can see a particular facet of the whole (sum types).
Yes, I use to throw all my random experiments on github. Now I host my own code repos, block bots as much as possible, and keep most repositories private. Those repos that I do open source (because I want to share with people I know), I release under AGPL
In a nutshell, first class effects and built in set of patterns for composing them get rid of boilerplate code. Combine that with type safety and you can churn out relatively bug free code very fast.
I always maintain that this is just familiarity, Haskell is in truth quite a simple language. It's just that the way it works isn't similar to the languages most people have started with.
I believe there's a strange boundary around the idea of simple vs easy (to quote rich hickey) and I don't know how to call it.. (or if somebody named it before)
functional and logical languages are indeed very simple, small core, very general laws.. (logic, recursion, some types) but grokking this requires unplugging from a certain kind of reality.
Most people live in the land of tools, syntax and features .. they look paradoxically both simpler than sml/haskell so people are seduced by them, yet more complex at the same time (class systems are often large and full of exceptions) but that also makes it like they're learning something advanced, (and familiar, unlike greek single variables and categ-oids :).
People intuitively expect things to happen imperatively (and eagerly). Imperativeness is deeply ingrained in our daily experience, due to how we interact with the world. While gaining familiarity helps, I’m not convinced that having imperative code as the non-default case that needs to be marked specially in the code and necessitates higher-order types is good ergonomics for a general-purpose programming language.
> People intuitively expect things to happen imperatively (and eagerly).
Eagerly? Yes. Imperatively? Not as much as SW devs tend to think.
When the teacher tells you to sort the papers alphabetically, he's communicating functionally, not imperatively.
When the teacher tells you to separate the list of papers by section, he's communicating functionally, not imperatively.
When he tells you to sum up the scores on all the exams, and partition by thresholds (90% and above is an A, 80% above and above is a B, etc), he's communicating functionally, not imperatively.
No one expects to be told to do it in a "for loop" style:
"Take a paper, add up the scores, and if it is more than 90%, put it in this pile. If it is between 80-90%, put it in this pile, ... Then go and do the same to the next paper."
Nope. The fact that he's telling you a high-level command is irrelevant. (If you didn't know what “sort the papers” means, he'd have to tell you in more detail; it's just the difference between calling your built-in sort routine or coding it.)
Anyway: He's telling you to do something, and you do it. It doesn't get more imperative than that.
You’re talking about what vs. how, but imperative vs. pure-functional is both about the how, not the what.
When you’re explaining someone how to sort physical objects, they will think in terms of “okay I’ll do x [a physical mutable state change] and then I’ll have achieved physical state y, and then I’ll do z (etc.)”.
> Understanding the map signature in Haskell is more difficult than any C construct.
This is obviously false. The map type signature is significantly easier to understand than pointers, referencing and dereferencing.
I am an educator in computer science - the former takes about 30-60 seconds to grok (even in Haskell, though it translates to most languages, and even the fully generalised fmap), but it is a rare student that fully understands the latter within a full term of teaching.
Are the students who failed the pointer class the same ones in the fmap class?
I didn’t say “using map” I said understanding the type signature. For example, after introducing map can you write its type signature? That’s abstract reasoning.
Pointers are a problem in Haskell too. They exist in any random access memory system.
Whether pointers exist is irrelevant. What matters is if they're exposed to the programmer. And even then it mostly only matters if they're mutable or if you have to free them manually.
Sure, IORef is a thing, but it's hardly comparable to the prevalence of pointers in C. I use pointers constantly. I don't think I've ever used an IORef.
If you have an array and an index, you have all the complexity of pointers. The only difference is that Haskell will bounds check every array access, which is also a debug option for pointer deref.
Hard to believe that “learners ... get confused over mutability” more than functional programming when millions of middle-schoolers grokked the idea of “mutability” in the form of variables in Basic, while I (and at a guess, at least thousands of other experienced programmers) have no fucking idea about pretty much all the stuff in most of the tens or hundreds of articles and discussions like this that we've seen over the years. Just plain stating that “mutability is more difficult” without a shred of evidence ain't gonna fly.
That’s an unfair comparison because these are two unrelated concepts. In many languages, pointers are abstracted away anyway. Something more analogous would be map vs a range loop.
And I'd say the average React or Java developer these days understands both pretty well. It's the default way to render a list of things in React. Java streams are also adopted quite well in my experience.
I wouldn't say one is more difficult than the other.
IMO `map` is a really bad example for the point that OP is trying to make, since it's almost everywhere these days.
FlatMap might be a better example, but people call `.then` on Promises all the time.
I think it might just be familiarity at this point. Generally, programming has sort of become more `small f` functional. I'd call purely functional languages like Haskell Capital F Functional, which are still quite obscure.
I suppose an absolute beginner would need someone to explain that Haskell type signatures can be read by slicing at any of the top level arrows, so that becomes either:
> Given a function from `a` to `b`, return a function from a `list of as` to a `list of bs`.
or:
> Given a function from `a` to `b` and a `list of as`, return a `list of bs`.
I find the first to be the more intuitive one: it turns a normal function into a function that acts on lists.
Anecdotally, I've actually found `map` to be one of the most intuitive concepts in all of programming. It was only weird until I'd played around with it for about 10m, and since then I've yet to be surprised by it's behavior in any circumstance. (Although I suppose I haven't tried using it over tricky stuff like `Set`.)
`fmap` is admittedly a bit worse...
fmap :: Functor f => (a -> b) -> f a -> f b
But having learned about `map` above, the two look awfully similar. Sure enough the same two definitions above still work fine if you replace `list` with this new weird `Functor` thing. Then you look up `Functor` you learn that it's just "a thing that you can map over" and the magic is mostly gone. Then you go to actually use the thing and find that in Haskell pretty much everything is a `Functor` that you can `fmap` over and it starts feeling magical again.
You and I have a math part of our brain that appreciate the elegance from the algebraic structure.
I’m saying that thing you did where you start representing concepts by letters which can be populated by concrete objects is not a skill most people have.
Maybe at its core, but Haskell in the wild is monstrously complex because of all the language extensions. Many different people use different sets of extensions so you have to learn them to understand what’s going on!
Not really, the vast majority of extensions just relax unnecessary restrictions. And these days it's easy to just enable GHC2021 or GHC2024 and be happy.
Accessibility is not an issue. It takes only a little bit of effort to get productive with a Haskell codebase. I think it's more of a mental block because the language is different from what one might be used to. What Haskell needs, and doesn't have, is a compelling reason for people to make that small effort (i.e. the killer usecase).
"Relatively bug free code very fast" sounds like a killer use case to me.
So why hasn't it happened? Some possibilities:
1. People are just ignorant/unenlightened.
2. Haskell is too hard to use for most people. I think that different programmers think in different ways, and therefore find different languages to be "natural". To those whom Haskell fits, it really fits, and they have a hard time understanding why it isn't that way for everyone, so they wind up at 1. But for those who it doesn't fit, it's this brick wall that never makes sense. (Yes, this is about the same as 1, just seen from the other side. It says the problem is the language, not the people - the language really doesn't fit most people very well, and we can change languages easier than we can change people.)
3. Haskell isn't a good fit for many kinds of programming. The kind of programs where it fits, it's like a superpower. The kinds where it doesn't, though, it's like picking your nose with boxing gloves on. (Shout out to Michael Pavlinch, from whom I stole that phrase.)
What kinds of programs fit? "If you can think of your program like a pipe" is the best explanation I've seen - if data flows in, gets transformed, flows out. What kind of program doesn't fit? One with lots of persistent mutable state. Especially, one where the persistent mutable state is due to the problem, not just to the implementation.
The reasons are going to vary depending on who you ask. I personally don't agree with any of your reasons. In my opinion, as a long time user of Haskell, the practical reasons are the following -
1. Tooling has historically been a mess, though it's rapidly getting better.
2. Error messages are opaque. They make sense to someone familiar with Haskell, but others cannot make the leap from an error message to the fix easily.
3. It's a jack of all trades. The resulting binaries are not small. Performance can be very good but can be unpredictable. It doesn't compile nicely to the web. Doesn't embed well. There is basically no compelling reason to get into it.
4. The ecosystem is aging. You can find a library for almost any obscure usecase, but it would be many years old, and possibly require tweaking before it even compiles.
Off the top of my head, memory safety challenges for junior Haskellers (laziness footguns), State monad being fundamentally flawed: there is an inability to get at and log your application state just before a crash, bloated tooling, GHC frequently breaking existing code. Laziness and monadic code makes debugging painfully difficult.
I acknowledge that those things can be challenging, however I'd like to respond to some of the specific issues:
- Space leaks due to laziness are a solved problem. I explain the technique to solve it at: https://h2.jaguarpaw.co.uk/posts/make-invalid-laziness-unrep... This technique has not completely percolated throughout the community, but I am confident that it does actually resolve the "laziness causes space leaks issue"
- Flawed state monad: well, you point out the analysis of its flaws from the effectful documentation. That's correct. The solution is: just use effectful (or another similar effect system. I recommend my own: Bluefin)
- GHC breakage: I've been keeping an inventory of breakage caused by new GHC versions, since GHC 9.8: https://github.com/tomjaguarpaw/tilapia/ There has been very little! The Haskell Foundation Stability Working Group has had a massive effect in removing breakage from the ecosystem.
- Laziness and monadic code makes debugging painfully difficult: I mean, sort of, but if you're using monadic code in the style of a decent effect system like effectful or Bluefin this is a non-problem. It's hardly different from programming in, say, Python from the point of view of introducing debugging printfs or logging statements.
Thanks, I've followed along with a lot of your posting on the Haskell discourse. One thing regarding this matter:
>well, you point out the analysis of its flaws from the effectful documentation. That's correct.
Thinking deeper about this, that there is essentially no way to fix this issue with StateT, because of the type choice, the monad, the composability requirement, all conspiring together to not be undone, does that signal something deeper that is wrong with the flexibility of Haskell, that we can progressively paint ourselves into a corner like this. Could it not happen again, but with another late breaking requirement, for effectful, or bluefin?
> I've followed along with a lot of your posting on the Haskell discourse
Ah, that's great to know, thanks. It's rarely clear to me whether people read or are interested in what I say!
Well, yes, in principle even a design that is perfect according to some spec could be completely wrong if the spec needs to change. and impossible to tweak to match the new spec. This is true of any language or any system. This raises a few important questions:
1. How easy does a language make it to "unpaint" yourself from a corner?
In Haskell it's easier than in any other language I've experienced, due to its legendary refactoring experience. For example, if you "incorrectly" used the State monad and got stuck, you can wrap it up in an abstract type, change all the use sites, check that it still compiles and passes the tests, then change the definition to use new "uncornered" implementation, again check it compiles and passes the tests, then unwrap the abstract type (if you like, this stage is probably less important), then add the new feature supported by the new implementation.
2. How likely is it to paint yourself into a corner in the first place?
In Haskell, again, less likely than any other language I've experienced, because the constructs are so general. There is far more opportunity to tweak a design when you have general constructs to work with. (That said, I've met many Haskell behemoths that couldn't be easily tweaked so, particularly contorted type class hierarchies. I recommend not designing those.)
3. Why won't effectful or Bluefin lead to "corners"?
Because they're just Haskell's IO, wrapped up in a type system that gives fine-grained control over effect tracking. Anything you can do in Bluefin and effectful you can do in IO, and vice versa. So to really paint yourself into a corner with IO-based effect systems it would have to be something that you can't do in IO either, and at that point we're talking about something that can't be done in Haskell at all. So there's no real downside to using IO-based effect systems in that regard.
Basically StateT uses Either to model either the application state or an error. So if your application throws, you lose the current state forever. They sort of painted themselves into a corner with this choice of type, there's no real way out now.
I agree loosely with what haskman above says about creating relatively bug free applications, the guard rails are so robust. But those same guard rails mean you can paint yourself into a corner that it is harder to get out of without imperative state, case in point above.
Most functional languages give you so much more choice in managing persistent mutable state than the typical imperative languages. In the latter everything can be made into persistent mutable state so you have both those that are due to the problem and those that are due to the implementation.
Haskell gives you a wide range of tools, from simulated state like the State monad, to real ones like the ST monad and IORef inside the IO monad. For synchronization between threads you have atomic IORef, MVar, and TVar.
If you problem requires you to have persistent mutable state, Haskell helps you manage it so that you can truly separate those persistent mutable state that's due to the problem from those that's due to the implementation.
I generally don't like Haskell, but the view of non-Haskellers (myself included) regarding state does remind me a bit of resisting structured programming because gotos are easier to program.
That's a good analogy and I like it! If someone is really used to simply using goto for all kinds of flow control purposes, there could be some resistance if there's a coding style guide that enforces using if/else/while/do and such. It reminds me of similar arguments about recursion is too powerful in Haskell and similar languages and good style means using map/filter/foldr and the like; or the argument that call-with-current-continuation is too powerful. When a single tool is too powerful, it can be used for a large number of purposes so it ends up being the code reader's job to figure out which purpose the code writer has intended.
Good question. I haven't revisited it in over 10 years. I think it was just too much "new" stuff to learn at once, making it feel too difficult to do anything in it at the time, especially considering I wasn't learning it full time.
Maybe now that I'm older and wiser (debatable) a revisit is in order, but lately I prefer dynamically typed languages (like Clojure and Elixir) to statically typed ones. I'll probably add it to my TODO list, but the list is long and time is short.
Well if you like Clojure you probably also appreciate how it doesn't just give you mutable variables everywhere but instead gives you different tools for different purposes. You have transients, atoms, agents, volatiles etc for different use cases.
Oracle influenced / bought academia into teaching Java for a generation. See Dijkstra’s criticisms[1] from the time, from when his department was forced to stop teaching Haskell to undergrads for political reasons. Note that Haskell had not been too hard for Dijkstra’s undergrads.
Later, Python took its place, since people realized the Java ecosystem was way too complicated and was turning off would-be CS students. Python directly targeted the academic use case by having similarities to C, Java, and Bash——it was not a better language, it just made existing imperative and object-oriented assignments easier for classroom environments. Believe it or not, a lot of programmers and even academics sort of give up on exploring significantly unfamiliar directions after graduating.
He's asking for something that is really inappropriate. He lost in the CS department, and he wants the Budget Council to decide on what languages should be taught? Like they know anything about it!
He lost a political battle, and he's appealing it to the only place he can, and he's buttering them up to do it, but the people that actually know something about the topic decided against him already.
And, you're quoting only one side of the battle. One vocal and eloquent side, but only one side. Maybe look into why the UT CS department made that change? (And not why Dijkstra says they did.)
He mentions the expensive promotional campaign that was paid towards Java.
> the people that actually know something about the topic decided against him already
Matters of pedagogy often involve value judgements and trade-offs, with knowledge alone being unable to provide definitive answers.
However, Sun/Oracle did know that more money would flow their way if undergraduates were to learn Java. The letter suggests that one or both of them decided to act accordingly.
It’s questionable to assert that every knowledgeable faculty member thought that pivoting to Java was the best option. (Dijkstra himself is a counter-example?) From the looks of it, just a single department chair——not the full CS department——had the decision-making authority on this curriculum change.
> inappropriate
Would inappropriateness make his arguments any less true?
Why in an academic context would it be inappropriate to request input from additional stakeholders? If the letter was unconventional, remember that a purpose of the tenure system is to protect unconventionality.
The Budget Council is not a stakeholder in the CS curriculum. The Budget Council does not have the knowledge or expertise to say anything relevant about the matter - and Dijkstra should know that. That's why it's inappropriate.
I mean, look, if you had a letter signed by the majority of the department, complaining about the chair's decision, then the Budget Council might consider reversing the chair, on the authority of the expertise of the majority of the department. But overrule the chair on the basis of disagreement by one professor? No way. You can't run a university that way, because there's always at least one professor who disagrees with a decision.
An academic knows well that grants and other forms of financing are their lifeblood. It’s also how the government chooses its priorities within academia.
Admittedly, I don’t know anything about who was on the budget committee to which Dijkstra wrote this letter. But it is just ordinary for academics to write proposals outlining their priorities in hopes that the grant/budget/financing committee will bite.
I don't think that's it. I know plenty of people who were taught lisp first thing at university, and as soon as someone handed them an imperative language, they never looked at lisp again. And lisp is way easier than haskell IMO as IO is just a function and not a pilosophical concept
I wouldn’t assume that your colleagues were less capable as undergrads than the students that Dijkstra encountered at UT Austin.
Imperative languages do offer many advantages over Haskell, in that most coursework and industry jobs use them and that, consequently, their ecosystems are much further developed. These advantages are a consequence of university programs' alignment with the imperative and object-oriented programming paradigms, to Oracle's benefit.
Your colleagues having never looked back at lisp is hardly evidence that Haskell would have been too difficult for them or that Oracle didn’t have a hand in this.
I don't think that holds water. We've had functional programming for longer than oracle or java have existed, and for far longer than oracle has owned java. Haskell itself has been around for longer than java or oracle-owned java.
Functional programming just seems harder for people to get into. Perhaps it's bad for everyone that people don't make that effort, but it doesn't seem like a conspiracy
My mistake, at the time, Java was being promoted by Sun Microsystems, which only more recently became a part of Oracle.
The promotional campaign that Dijkstra mentions was perhaps orchestrated by Sun Microsystems, though perhaps not since Oracle was indirectly strategically aligned with Java, as the eventual acquisition shows.
Yes, it is more difficult to get into FP. However, asking the question why it became more difficult, when historically the opposite was true, is certainly worthwhile. Surely there was some cause.
Nah man, Oracle… here is a personal story. I was teaching at a Uni, introductory programming course. Dean hits me up and asks if I can teach introduction to web development as then current professor was going on maternity leave. I was like “heck yea, that sounds like fun.”
before the first class I get an email from one student asking if they must purchase the book for the class since it $275 (this is years ago) and I was taken aback, what kind of book costs $275 - even for a college textbook that was nuts. I told him to not purchase it until we meet for the first class. I go to the office and see my copy of the book, it is programming the web with oracle forms from oracle press!!!! I talked to Dean and he was like “yea, that is what we need to teach!” needless to say that, none of the kids bought the book, and I did NOT teach oracle forms, and I was never given that class again :)
My first CS class was in Scheme (R6 iirc), and the year after they switched to python. Then a thousand cries in failure to understand python metaclasses. They are garbage at the repl, and you have a distinct set of folks that edit their editors.
Most Python programmers don't really have to understand metaclasses or other advanced concepts like descriptors. The main metaclass they'd use would be to create abstract classes, and these days you can just subclass ABC.
I wish I'd have had either of those. C++ was the fad when I was going through, so freshmen learning how to write linked lists got a fast introduction to (and a hatred of) the STL.
Haskell is superb at handling persistent mutable state. The only problem is that you may have analysis paralysis from choosing between all the choices (many of them excellent).
4. History. In those types of discussions, there are always "rational" arguments presented, but this one is missing.
> One with lots of persistent mutable state.
You mean like a database? I don't see a problem here. In fact, there is a group of programs large enough, that Haskell fits nicely, that it cannot be 3; REST/HTTP APIs. This is pretty much your data goes in, data goes out.
No, I mean like a routing switcher for a TV station. You have a set of inputs and a set of outputs, and you have various sources of control, and you have commands to switch outputs to different inputs. And when one source of control makes a change, you have to update all the other sources of control about the change, so that they have a current view of the world. The state of the current connections is the fundamental thing in the program - more even than controlling the hardware is.
Thanks. This does sound like a state machine, though, but the devil is probably in the details. Yes, here Haskell is probably a bad choice, and something where direct memory manipulation is bread and butter should do better. Which is completely fine; Haskell is a high level language.
But in your example, PHP is also a bad choice, and alas, it dwarfs Haskell in popularity. I can't really think of where PHP is a great fit, but Haskell isn't.
Haskell can be taught as recursive programming, learning about accumulator parameters and high order functions. A background in logic (which most programmers have to some degree) is more useful than an approach in math in that regard.
Okay, might be definitional, but when I think of 'first class', I think of something baked in to the language. So in Haskell's case, in the Prelude I suppose.
That's kind of the opposite of what functional programmers mean by "first class", or at least orthogonal. "Functions are first class" means that they don't have any special treatment, relative to other entities. They're not really a special "built-in" thing. You can just pass them around and operate on them like an other sorts of values.
That's true, but also true is the fact that a large part of the reason for using alternatives is to avoid this kind of data collection. So it's reasonable to expect to lose users with a decision like this.
It is still much less data, and does not allow them to identify you AFAIK. Even if they go with opt-in (not yet decided - it seems to be being debated and thy are asking for feedback) it is still far better than proprietary OSes.
There's this deep sense of entitlement coming from software devs and vendors, that's completely unjustified. Comparisons on the amount and type of data collected is missing the point. It doesn't matter whether Manjaro is sending more or less telemetry than MacOS - neither of them should be doing it in the first place.
They have no actual right to that data, no matter how much having it makes the devs' jobs easier. What they should do is ask for it, honestly and convincingly, like asking users for a favor, because it's exactly what it is (and it's not like anyone is considering compensating user for the service).
That's not nearly as useful though. What devs want is to know how their users are interacting with the software, so they can make improvements to it. Opt-in gives a much smaller sample size, and a strong selection bias. I don't know enough to say that it's completely useless, but I wouldn't be surprised to hear that it is.
> Like any at all?
No, don't sidestep the question, actually answer it. What data are they collecting and how is it harmful? The devs feel this information is useful to make their software better. If you think you are harmed by this, please explain how.
If you're collecting data, you need to prove it's not harmful - not the other way around.
- But how is collecting data harmful?
The problem isn't any single data point. It's that historically, seemingly innocent data collection has repeatedly enabled serious harm when contexts change. (And yes, I'm aware of Godwin's Law[1], and/but the historical examples are directly relevant here.)
- Surely one more app collecting data isn't the end of the world?
No, but it's death by a thousand cuts. We're at a point where young tech professionals are already resigned to total surveillance. Each new data collection might seem minor, but they're all contributing to a flood of personal data leaking from our devices. We need to start turning off the taps, not adding new ones.
GGP said avoiding data collection is a reason to use linux. GP asked what data collection. The answer was "any at all". That is not "sidestepping the question". GGP didn't state they think they are harmed by data collection, they only stated they don't want their data to be collected.
Right, so we're back to the OP of this thread--open source software doesn't have access to a useful tool, and you can't explain why you are refusing them to have this tool. This results in lower quality software, to no one's benefit.
I disagree with your attempt to frame this like it is an issue that needs to be resolved at all costs. Yes, I don't give developers access to my data which would be useful for them. No, I won't explain why I'm refusing this. Yes, it might result in some lower quality software. I am completely fine with that situation and wish it will stay that way.
That's totally fine and they have an opt-out mechanism for people who feel like that. I don't think anyone is behaving badly here. They want to collect data to make their software better; opt-in has significant downsides; and you have an option to turn off the data collection. What are we complaining about?
The problem is that right now I only know about this in the first place because I just happened to open hacker news at this hour of the day. You seem to agree that it is totally fine if I don't want my data collected, but how could I even prevent it if I don't know about it (since it is opt-out only)?
This is a fair point! I think for people who feel so strongly about this, it's perhaps the best compromise that you have to go digging into the settings for it, since opt-in is basically the same as not having it at all. It seems unlikely to me that a project like Manjaro would go out of their way (as Google etc do) to use dark patterns and disrespect your wishes here.
"Opt-Out" is a dark pattern per definition. If everyone does it (and on some platforms many people do), it leads to an impossible eternal whack-a-mole situation where the user is constantly monitoring their system while still being unable to ever be 100% certain that every leak is closed.
This is why some users opt for a system that enforce Opt-In or even Opt-Never by default. The sheer peace of mind is worth a lot.
And it's not even such a strange stance. Consider eg Enterprise or National security. Why shouldn't a regular user have such security by default?
> If you think you are harmed by this, please explain how.
I expect my computer to do what I order to do and not to do shady things behind my back. Imagine if you were a business owner and your new hire would sell your commercial secrets to competitors. Would you like it?
As for improving software, users should contribute voluntarily, not mandatory otherwise it looks like a form of non-monetary tax.
If you want to some actual examples of how optimization based on data can be harmful, I suggest reading Seeing Like a State. If more people that made decisions based on data read this book, the world would be a better place.
The TL;DR is that data about a system does not reflect the underlying system perfectly, and thus is a distortion of the real system. Decisions based on this distorted data can be equally distorted, sometimes dangerously so.
For software telemetry for instance, telemetry only gives the "what", not "how"
eg. feature X is not used.
Possible explanations:
- Not useful to users -> Probably should be removed.
- Not discoverable -> Probably should be kept and made more discoverable.
- Difficult to use -> Probably should be kept and made easier to use.
Most times (I'm looking at you here Mozilla and every commercial software provider ever) people take the shortcut of assuming the first explanation and removing it prematurely.
Features A and B may be equally important, but B may be applicable only in specific circumstances. If you'd compare A and B on the metric of "how often it's used", you may see B being used much, much less than A, but that's not reflective of the feature, but of the job being done.
> That's not nearly as useful though. What devs want is to know how their users are interacting with the software, so they can make improvements to it. Opt-in gives a much smaller sample size, and a strong selection bias. I don't know enough to say that it's completely useless, but I wouldn't be surprised to hear that it is.
So? Crime being profitable doesn't make it legal.
> No, don't sidestep the question, actually answer it. What data are they collecting and how is it harmful? The devs feel this information is useful to make their software better. If you think you are harmed by this, please explain how.
So if I enter your house you will also enter a discussion of what I stole and if you really needed it before you are allowed to kick me out even though I never had permission to enter your house in the first place?
Let's try another analogy, someone breaks into your computer and copies all of its content, including saved passwords in an unlikely case you save them, and installs a keylogger. It is not harmful by itself, right?
No, you are not breaking into my home and stealing stuff. Nevertheless, an analogy can be made between breaking into my home and stealing stuff, and taking my data without consent. "Analogy is a comparison or correspondence between two things (or two groups of things) because of a third element that they are considered to share." - try again to think what the third element could be in this case - I'm sure you can do it!
It's theirs, not yours. Fundamentally, it's not about harm - it's about you getting stuff you have no (moral, cultural, and in many places legal) right to.
As for harm: there is possibility of it, a lot of software does collect data for it to be used against users' interests, and I have no reason to believe yours isn't one of them.
But that kind of data leads to dev-centric practices like A/B testing that's just being used to confirm their own assumptions and is tailored towards their own goals, not the users'.
Asking the users what they like and why is much more useful.
I frequently wonder what breed of human sincerely disagrees. I sometimes think the realm of software encourages through detachment (remoteness, distance from the users) a sense of liberty for the id. If this shit was attempted physically, in person, there'd be a lot of missing teeth.
What? It happens all the time. Retail stores count the rate at which people enter their doors to help determine how to staff the store. Traffic engineers count how many vehicles and pedestrians use certain roadways so they know which modes to optimize for. Your ISP gathers aggregate statistics about how much bandwidth is being used across regions to decide where to upgrade their network.
Data collection can be harmful, but it's also extremely useful to know how people are using products and infrastructure. There's a balance, and if you're on the "zero data collection" side, I think you need to justify making the devs' lives harder by explaining what harms will come from the proposed collection.
> I think you need to justify making the devs' lives harder by explaining what harms will come from the proposed collection.
I disagree. I don't need to show actual harm to reasonably object to being spied on. At least Manjaro isn't talking about making this mandatory, but opt-out is is still a very poor look that would make me avoid using it as long as there are other options that are more respectful.
Please explain what specifically Manjaro is proposing to do that you classify as being "spied on." Don't handwave this away, actually answer the question.
"espionage: The act or process of learning secret information through clandestine means."
That is, the specific information does not matter; the fact that someone wants to keep it hidden (which is their stated preference), and someone else wants to collect it through clandestine means (which is how we could interpret a sneaky opt-out mechanism) is enough to define it as being spied on.
1. Your hardware specs are secret information? How many times you clicked on i3wm's settings panel is secret information? I mean OK, you might really want to keep the latter for yourself, sure, but calling it a secret information is reaching.
2. It very much matters what the specific information is. I too wouldn't want my Linux distro scanning my GMail inbox through their distro-bundled browser, of course. But how many times I started Kitty is something I don't quite enjoy being shared but I also wouldn't be outraged if it was.
Nuance matters, just doing an extremist takes does not help anyone.
I think a good example in support of your statement is the superfluous metrics wantonly spewed by, eg, Firefox. A cursory perusal of about:config will list many many default settings which are completely unnecessary for normal browser function, eg dom-battery, general telemetry, dubious DNS and dozens (maybe many dozens) of other better examples I've seen but don't immediately remember. The privacy holes here are mostly by design. Clearly more than necessary hardware info.
There are endless examples of data flowing where one wouldn't expect. Doesn't IP6 wrap the MAC address into the IP? This alone is pretty significant. It goes on and on, but I don't see this as an excuse to go full-nudist in a fit of futility with all data.
And another thing I frequently wonder: who benefits? I honestly don't see things functionally improving in a way that I can't live without as a result of all this telemetry. I don't see that many people clamoring for the kinds of improvements this telemetry is supposed to enable. I know technology does improve, but I just can't remember where things were so bad I needed to mass-email my dossier to the world. Generally, I just made a forum post or bug report.
Of course, that's your right. That's why I vet my software on a per-piece basis. It can be exhausting but I at least know that stuff that I'd be very not okay with being shared, is not in fact shared.
As said in another comment of mine posted just minutes ago -- practice shows that anonymous telemetry is the only viable way of getting some usage data. Almost nobody fills out surveys.
Do most software need those stats? I'd say they don't, but I worked on pieces of software that absolutely needed to know which parts are most used and which are almost not used because the extra features cluttered the UI and confused people, leading to less buys / subs.
I had trouble finding exactly what MDD collects, but my assumption is that it collects data about the hardware in use and what packages are installed, at a minimum.
Okay. So you can't explain how you are harmed by this data collection, and you have an opt-out mechanism you can use to disable it anyway. What are we complaining about?
I'm not saying I can't explain harm, I'm saying that the presence or absence of harm is orthogonal to the issue.
What I'm complaining about is the evasion of having to get informed consent to collect personal data. Opt-out is a way to try to cover your ass while at the same time being able to avoid asking for consent.
The argument for it is always the same: if we make it opt-in, then not enough people will opt in. Which is another way of saying "if people won't give us permission to collect data about them, then we need to stop asking permission."
Well, yeah. If opt-in doesn't lead to useful results, then you may as well not have the feature at all. But they want the feature, because it helps them improve their software. So, "collect data in a way that preserves as much privacy as possible by default, and provide a mechanism to opt-out entirely" is the least-bad option. It gives them the data they want, and it provides an opt-out mechanism for people who don't trust them with the collected data. It seems like the best compromise to me.
It's not really a compromise. It's devs declaring that they deserve access to this data regardless of what users want, and trying to make it less objectionable. It remains the case that this is a back door method of extracting data from users that they don't really want to give.
If users didn't mind giving it, then enough would say "yes" to the opt-in screen that it wouldn't matter. But they don't, so these devs are trying to impose the very thing users don't want as forcefully as they can get away with.
What spying on, dude? Have you ever wrote telemetry handlers even once in your software?
I've done so, no less than 15 times in the last ~9 years. We always took special care to never include anything personally identifiable; it was a hard requirement and was enforced in code reviews and because of that we ended up hashing user IDs because we still wanted to do flame graphs and various distribution statistics of API endpoint usage and user IDs were one of the axii (two others were hours of day and days of week), but we didn't care who the user was.
Seriously, a little less extremism helps. I am a programmer, likely just like you. We are trying to get some data to improve our software. In several of my previous gigs even the CTOs barely cared about the telemetry graphs and aggregation dashboards and only looked at them at the middle of the quarter to make sure we're not spending too much on Grafana so the executives won't bite their heads off. And the CEO / marketing? Forget it, they don't care.
Of course there are some very predatory companies out there, no doubt. But I think we would be very hard-pressed to put the team of an open Linux distribution among them.
> We always took special care to never include anything personally identifiable
Sure, but that's not really the point. First, in every company I've worked at that has dealt with PII, their definition of "PII" excludes quite a lot of data that should count.
But even if all PII is properly excluded and everything is actually anonymized, that still doesn't address the point. The point is all about consent. Consent seems like it should be table stakes, no?
> Consent seems like it should be table stakes, no?
I agreed for most of my career but not anymore. Truth is, everywhere I worked, the voluntary user surveys had extremely low engagement rate -- which was frustrating for the dev team who wanted to make sure their users like the product. Sometimes that means deprecating / removing parts of the software.
I get your idea and I don't generally disagree. It's just that practice has shown that collecting anonymous telemetry is the only really viable way of getting information of what's being used, how much, does it perform well (I used telemetry stats to optimize a hot code path on a number of occasions) both in terms of hardware efficiency and business terms, and others.
It's one of those things that I solved for myself by trusting or not trusting each piece of software individually. That's why I am currently slowly migrating back to Linux (from macOS); Apple overdid the telemetry to downright complete spying and sometimes censorship so I am no longer okay with them.
> It's just that practice has shown that collecting anonymous telemetry is the only really viable way of getting information of what's being used, how much, does it perform well
Again, we come back around to "if users don't want to willingly give us this data, then we're just going to take it." That's what I think is ethically objectionable. Sure, the data is useful -- but if people don't want to give it, that usefulness does not justify taking it anyway.
Opt-out is better than not being able to even do that much, but in my view, it's still unethical. And, practically, it means that I have to treat all software as suspicious and can't really be comfortable with any of it.
I'm used to that with smartphones and Windows, and deal with that by avoiding installing any software if unless I absolutely have to. I'm just trying to avoid having to take the same stance with OSS. But perhaps that's a lost cause and trust in any software at all is not supportable.
I don't, but I can't speak for everybody else. In my case the telemetry was on the backend so the users had no say at all -- though my teams made sure for there to be zero personally identifiable information (plus our API endpoints never got even one piece of information about the customer's devices / desktop browsers; I code-reviewed those PRs and enforced it).
Don't look for boogeymen on HN, they are not on this forum. ;)
I'll again agree opt-out by default is not the most privacy-friendly approach but voluntary user surveys had almost non-existent user base. So some companies took a more aggressive approach. Those I don't like. But a Linux distro? Dunno, seems like an overreaction in this particular case.
First, yes, data is extraordinarily valuable. No doubt.
While it may be commonly accepted by most, I don't want my personal computer crawling with telemetry. I despise the idea.
The harm is, in my opinion, partly in creep, where just a little more, here and there, leads to a festering, unchecked data brothel. And regarding 'harm' as a necessary parameter for maintaining privacy, dignity, etc; it would cause absolutely no harm to me if I was watched every time I used the bathroom, provided responsible handling of the acquired video. But I don't want this and would object to any effort otherwise. I don't think harm is the only factor.
When it gets to Microsoft-level telemetry, yes, I'd say monsters. This situation? Less so. But how it so easily approaches such levels needs consideration. There's simply a prevailing view with data where "if it exists and we can access it, it are belong to us" and collectively it is monstrous.
I'd rather people become overly (even unreasonably) sensitive to it than keep going with the flow. It's too easy to start with innocent bits, then more and more until real-time surveillance style Windows Recall shittery.
Fair question and I'm not certain. If I guessed, I'd say you're generally right, for now. I think it's important to keep that crap out altogether and maintain a refuge somewhere, where one can doff the coat, sit down and work alone. Yet a time where this is impossible is foreseeable without much imagination.
I hear they tried, many times, and less than 0.1% of users responded.
I personally don't think it's such a monster move to send some anonymous usage data, especially if you present a box with a choice once the program starts for the first time. (Granted that's not what Manjaro is doing here.)
Great work! As a nitpick, I noticed that there is some contrived indirection going on inside `Coreutils.Util`. Instead of the Utility class they could have just used an datatype. The current way could be confusing to newcomers looking to learn from this repo (if that was the goal).
I'll admit it's mostly this way because I thought ExistentialQuantification sounded cool and wanted to give a try with classes - this could definitely be tidied up
It should be noted that the build quality of the ThinkPad are much higher than the IdeaPad. I have both and the IdeaPad is more or less on par with other cheap consumer laptops.
>It should be noted that the build quality of the ThinkPad are much higher than the IdeaPad.
And still, they've fallen so low in recent years I don't see it being drastically better nowadays. Had a T495 for a while. Worst laptop I've had in a decade.
Thinkpad P14s series with AMD, they make sure it is fully supported on Linux.
I would take the just arrived gen 5 AMD because of the new Zen5 cores. Same perf than with Zen4, but much lower power consumption.
I know lots of developers (me included) who want something solid and stable and Linux, i.e. definitely not a macbook