Your math is too simple. Consider alternative scenario where you invest your down payment and what you save on mortgage (generally slightly more expensive), invest what you save on extra repairs / maintenance.
After those 30 years it's entirely possible that as a renter I have:
- been able to take advantage of moving freely for personal or economic reasons
- been able to move to a newer completely renovated apartment number of time for no costs other than moving my stuff instead of either paying for it or living in a house with a 20 years old kitchen
- been able to live without stress about all the responsibilities that come with owning a house, being in debt and tied down
- come out with enough money from my S&P500 that I started 30 years ago to decide to settle down somewhere and buy the house or continue as is with bunch of money saved up
I am not saying that this is what will always happen or that owning a house is strictly worse but the "rent is money down the drain, mortgage payments is money you keep in the end" angle is way too naive.
You must be a much better investor than I am. On property I have to date never lost a dime, and made pretty good money every time I bought something and held on to it for at least five years. But on other investments the returns have been all over the place, from tripling my money to losing it all and everything in between.
If you don't have a home that you own yet and your only options are mortgage or renting I'd pick the mortgage any time, but I'd always make sure to buy in a market that is active.
It sounds like I might be in this case. We're talking decades. You are talking about various investments and even making sure to time the mortgage to beat the market.
I am talking about putting the money in S&P 500 and similar without trying to time, beat or track anything.
You always bought property in or nearby city centers then. My mom's house lost 10k (~5%) in value in 10 years. Not an issue at all, but rural houses do not appreciate as much as the worst ETFs
> been able to live without stress about all the responsibilities that come with owning a house
A lot of renters report that being forced to move house regularly is a huge source of stress.
> rent is money down the drain, mortgage payments is money you keep in the end
Mortgage is paying rent on a house-sized chunk of money which you have to spend on a house. Economically they're similar. Except the US has a big tax advantage which is only available to morgagors. (Does it also have the home capital gains tax exemption?)
> I am not saying that this is what will always happen or that owning a house is strictly worse…
In my adult life I have lived in ten rental properties and have purchased two homes (currently living in one of them). In my cases of home ownership, my property values increased enough over the time I lived in those properties that you could theoretically view the monthly outlay of my mortgage and taxes as nearly a zero sum game. However rent was always an expense and the amount of investment income I made on a theoretical “down payment” amount across the terms of those leases would not come close to erasing the rent I paid.
I realize this is anecdotal and specific to my location and frankly luck, but for me it makes having a mortgage make sense financially.
Things must be really different in the US then.
The downpayment here is really not an amount that you can make any serious investment with. Even if I made it I could not afford to put it in high-risk/high reward investments and the low risks ones are not even worth the cost of blocking the money.
- I can still move freely for whatever reason i want. I simply sell the house and buy a new one.
- the cost of changing a kitchen once in 20 years in your example but even once in 10 years is insignificant and the fact that I choose it according to my needs instead of getting whatever I find more than make for the cost.
- I have no stress of needing to move (generally the more flexible rent agreements that allow you to move out whenever you want also come with the risk of getting kicked out or rent changing significantly)
But again all comes down to the difference of rent vs mortgage and here in Europe for most people the difference does not make sense. let's talk specifics.
Let's say rent is 700eur/Month and mortgage is 800Eur/Month. the difference is less than you would spend for frivolous, spur of the moment things.
But this ignores taxes. Also most renters don't save and invest they consume whatever they have left each month sometimes more. Owning a house with a mortgage forces you to save. Given real estate price trends over history throughout the world, do you want to be long or short it?
The explanation that made the most sense to me is that, for people who for whatever reason, don't tend to invest budget surplus, buying a home leads to better outcomes on average. It keeps the lifestyle creep at bay with minimal mental load.
How do you calculate the monthly savings of renting vs mortgage? E.g, how do you decide how much more to put in the S&P500, than would have been possible if you had a mortgage?
Kinesis freestyle is pretty cheap. I bought the mechanical one which is more expensive but don't use it. I'm too comfortable with microsoft ergonomic keyboard with light membrane keys https://www.microsoft.com/en/accessories/products/keyboards/... of which I have three.
The only issue with the microsoft one is that it has to fit you since it is split but one piece. With a fully two piece split keyboard you can adjust it to your needs.
Fertility is definitely not below replacement rate. Maybe in western countries.
You find it hard to say it's unfair but you seem to hint to it. I don't think it is in the slightest.
Also you sound like a software developer from the US so you likely can afford to work say 80% and "recoup" some of that free time.
Yes, I meant in the West, where much of the high-tech work the article is about.
Well of course I feel it's unfair at some level! That is indeed why I made the comment.
Assuming we want society to continue, someone has to have kids. And while of course we don't want to force individual people to have kids, we should (again assuming we do want humanity to continue) probably try to achieve some kind of balance to where large swathes of people don't self-select out of having kids due to economic/career reasons.
And if the working spouse tries to shed home responsibilities to help them focus on their career, this simply promotes inequity, both sex-wise (since more women are stay-at-home spouses than men) and class-wise (wealthier people will be able to hire additional help, while poorer people will not).
I have had plenty of math classes both in high school and later at university but I don't recall any significant distinction that would leave me with some concrete idea of "algebra" vs. "calculus" vs. "whatever" years later.
In the US we'd reserve "mathematical analysis" (or more specifically, Real analysis) to the college level classes which involve writing proofs about the continuity of functions between sets of real numbers. You'd probably end up with a lecture on the mean value theorem here, and leave with the ability to prove it, among other things
"Calculus" is the application of that theory without argument. It's an advanced high school class or an early college one. There you'll integrate or differentiate real valued functions for use in optimisation problems or for determining qualitative features of such a function (e.g. where is it flat, where is it defined, etc).
In the US, you can probably pass calculus without writing a proof, but you can't pass mathematical analysis without at least understanding epsilon/delta proofs.
I'm pretty sure we did proofs in high school. But that was a while ago, don't know what they do now.
Hey now that I think of it, "mathematical analysis" had continuity, limits, some integrals. And then every mathematically inclined uni specialization had "integral and differential calculations (let's shorten it to calculus)" which was more advanced use of integrals :)
A rose by any other name would smell as sweet, but it may be called a thorn in a different locale.
> In the US, you can probably pass calculus without writing a proof, but you can't pass mathematical analysis without at least understanding epsilon/delta proofs.
I agree with you about ideal approach to tests and spec but sometimes the "specification coming from elsewhere" is pretty vague, incomplete or simply lost in time.
In that case, if I have code that satisfies that specification as demonstrated by say acceptance testing or years of use, I will happily opt for test that take the code as specification. I am not using those tests to prove the code "correct" but to prevent regressions during refactor or new feature development later on. Worse case scenario they should keep the program as broken as it is now or has been for years.
> I am not using those tests to prove the code "correct" but to prevent regressions during refactor or new feature development later on.
Then you don't really need to have the tests written, what you need to do is to compare the output and behavior of the two code bases before and after refactoring. I mean why use a low fidelity approximation of the original (i.e. test that only works in some cases), if you already have the original at hand? (After all, how can you predict ahead of time, which features of the original version will need to be tested as preserved in the new version? Isn't writing regression tests in advance a form of premature optimization?)
From the view point of testing, you're comparing two implementations. So again there needs to be some understanding which one is the correct specification - the original one or the new one?
It would be more useful if the industry actually treated the two cases - i.e. testing against specification and regression testing of the refactored code - as completely separate, instead of trying to push unit tests for everything. Because in the latter case, ideally you can prove that the refactored code is doing identical thing, which is stronger than testing.
Those error pages are actually infuriating. Maybe I'm just getting older but seeing "You broke reddit" and all the other shit annoys the hell out of me. It's their website, the error should be "we fucked sometning up, sorry" and be done with it.
I agree, I really long for the days of simple "there was an error" messages. This trend of "you bwoke us" or "OOPSIE WOOPSIE!! Uwu We made a fucky wucky!! A wittle fucko boingo! The code monkeys at our headquarters are working VEWY HAWD to fix this!" messages is infuriating to me even more than a simple error.
Following the trend, trend setting, been like that since for ever, or something else: name it what you will. Whatever it is, I'd certainly call out the presence of a (less than welcome, imo) trend.
If Reddit didn't follow this trend, they created it.
The tongue in cheek communication and cute illustrations on error pages started with web 2.0 and is now so prevalent that a regular system error message is a refreshing experience.
Imagine being a non-tech savvy user or someone who understands the world more literally than figuratively. That sort of message could be pretty confusing for them.
It depends on the type of user, even among the non-tech savvy.
A message as simple as "Sorry, reddit is receiving too much traffic right now and can't handle your request. Try loading this page later.", 99% of people will understand.
But there's that 1% of people that aren't just non-savvy, they're willfully non-savvy to the point where words stop having meaning to them just because they're referring to something related to a computer. The type where if you ask them something as simple as "Is the computer turned on?", they say they don't know. Meanwhile, the screen is showing their desktop.
So they actually made the intentional decision to change it back to the much more accusatory "you broke it", I'm not sure exactly when that occurred but I suspect it was roughly the time spez took over.
Similarly, the self-important prima donnas of Mozilla who think the fate of the universe depends on forcing their shitty “zarro boogs found” in-joke on the general population — or rather, the subset of users who are only there because they’re already frustrated by a bug … that they now can’t find an existing ticket for.
seeing "You broke reddit" and all the other shit annoys the hell out of me.
Could be getting older indeed, but his one, and "guru meditation error", almost make me act like younger me again feeling a sudden urge to smash my monitor into pieces
I guess that's the good think about getting older (I'm 46) - I need larger text than I did 20 years ago, so that issue isn't one I deal with. Still, at standard OS sizes I can't detect pixelation, even when I compare my 32 inch display next to the laptops I have (16" MBP, 13" MBA)
> The PR was bigger than what I felt I could sensibly review and, in honesty, my desire to go through the hours of work I could tell this would take for a project I no longer used was not stellar.
This doesn't look like a particularly big PR to me, judging solely by the amount of code changed and the nature of the changes at first glance.
Are most of your PRs at work tiny, couple lines of code at most? Am I sloppy for not even consider reviewing this for "hours"? Are all code bases I have worked on sloppy because features often require changing more code than this?
When I wrote this, almost all my engineering experience was in OSS database development. That environment has several forces pushing towards very detailed reviews and clean commit histories, like others have hinted at in the thread here.
The PR review is in public and heavily scrutinized by paying customers and passionate community members. APIs cannot be broken, and even with automated tooling it's very easy to accidentally introduce a change that breaks tens of thousands of deployments. And the code itself is really sensitive. If a bug gets in and released, it can be several days of grind to get a patch out, and after that many months of new tickets for the bug from customers that won't move to the latest patches.
Now I work somewhere where the code I write runs in-house. If a bug sneaks in, it's usually a 5-minute redeploy to resolve and the cost is borne primarily by my own team.
So I think the answer to your question is: it really depends on the environment you're writing code in. In some setups the cost of introducing mistakes is very high, so it makes sense to pay a lot at the review stage; in others the correct balance is less strict review and fast fixes/rollbacks instead.
The PR isn't very large, diff-wise, but IMO it isn't well made, which makes it hard to review.
Half of the commits are merges from other fork branches into the contributor's master, and the PR name and description doesn't mirror that in the least.
Then (eyeballing) 90% of the diff is whitespace changes, which would be fine in its own PR ("Formatting changes") because it's easy to eyeball that it's just that, but when you mix it with other changes, it's hard again.
It's a mistake to filter out whitespace changes on Python diffs. These days it's best use `Black --check` or similar in CI to make sure they've matched your whitespace settings, to minimise these changes.
I am working on a GitHub pull request viewer that displays changes using a semantic diff, and therefore has some more advanced whitespace handling behavior than just ignoring leading or trailing whitespace. I tried it with this PR:
It doesn't make a huge difference, but it filters out changes like the added line break in "if value: value = str(value)" nicely. I haven't announced the project yet, but maybe someone will find it useful :-)
Also, Git had it since forever, and since GitHub's diff view isn't particularly convenient for browsing multi-commit PRs you usually review the changes using Git anyway.
That said, I'd ask the contributor to tidy up the branch first. It's kinda disrespecting to ask others to review branches in such state.
After a brief scan I’d call the full change reviewable enough I could do it in a sitting. Most of it looks reviewable on my phone. But seeing >30 commits, I’d pause. Partly because I’ve become a lot more sensitive to the impact of commit history itself, partly because the quick scan of such a small change set doesn’t seem to line up with so many commits, but mostly because it implies much more context exists than the attention I’d pay if it came pre-squashed.
That kind of implication stops me in my tracks to learn more. I’ve spent literal days tracking down the meaning of single line code changes through multiple dozens of commits, sometimes across repo boundaries (ahem the original author’s suggestion of deprecating in favor of a fork comes to mind).
The size of this particular PR only becomes a factor when any one of those numerous commits can become that rabbit hole. How many humans’ days are going to be spent tracing history through this particular merge? For how many different reasons? I didn’t even look at the changes midway, but how many nuances are buried in there and lost unless this weird bundle of changes is preserved?
I bet you're other thinking in this case. In general, the expectation on GitHub is that PR commit history doesn't matter, and owners should simply squash on acceptance. I think most contributors don't event consider that all their commits are visible or would be of interest and only think about the final product.
It's certainly simpler for the contributor to do the squashing, but when GitHub makes it so simple in practice it doesn't matter.
Imagine writing a highly performant and featureful relational database and successfully using it with large projects for a while without the database itself becoming particularly popular and then having a company come along and popilarise your database by telling lots of people about how good it is as a flat key value store.
Then people are really confused and annoyed as to why their key value store has this complicated and confusing relational database attached to it so they write lots of guides skimming over the details to help people get better at using the database to just store keys and values in one table.
That story strikes home for me. I worked at a company years ago where I designed and implemented a custom event queue like architecture using lua scripts in redis. The data would eventually end up in another database, but it would start its journey by being sent to redis. Once it was in redis the application servers considered it to be "committed".
Of course, sometimes bugs showed up in our system. One of the engineering leads would often say "oh lets just clear the redis cache". Every time I told him no, and once again slowly explained how we weren't using redis as a cache, and how deleting everything in redis would delete user data and be a terrible idea. He would have this far away look in his eye while nodding along and pretending he understood. I guess in his mind he was just thinking - why on earth would it be unsafe to clear the "redis cache"?
Months later I went on holidays. They ran into some bug. He reacted by wiping everything in redis. And, predictably, all hell broke loose. User data rollbacks happened, which caused cascading failures in the UI (which assumed that rollbacks would never happen). The team ended up reconstructing some lost data from some JSON which accidentally ended up in web request logs. Users had downtime as the whole app broke. It was a disaster.
When I got back to the office, I was hit with some strange combination of "why weren't you here" and "why didn't you tell us". Ugh. I still think about it sometimes. I have no idea how I could have handled that better. But I can tell you one thing for sure: I lost a lot of respect for that engineer.
As far as I understand, Linus doesn’t write code. He reviews code other people have written, and even other deputies review the code before it gets to him.
Additionally, many people are paid to work on his project by other companies. Linus doesn’t pay them, yet he’s their boss.
All of this is to say that Linus is very insulated from externalities. He can insist on his platonic ideal of a commit and SCM, if it makes his life easier. He’s like the editor at a publishing house, rejecting countless manuscripts yet never writing a word himself. And that’s fine.
However, most people do not use an SCM like Linus does. If you’re maintaining an open source project on GitHub you’re probably working for free, as are the people submitting PRs. The more difficult you make their lives, the fewer people will be willing to submit PRs and the more work you’ll have to do eventually.
It's really irrelevant what Linus does now or what your personal opinion of version control is. The point is, git does certain things well, and a large portion of its users don't seem to need or want it. Not it might be because those users don't understand what they're missing out or whatever, but git was written by Linus from endless experience dealing with both contributing and maintaining code. It's simply wrong to suggest that Linux has no idea what he's talking about from a contributor perspective. Moreover, there's still plenty of people, who actually understand git, who are happy with the workflow that it was designed around. Yes it's a complex tool, but it's not as nonsensical or confusing as the people insisting on not using it to its full extent seem to always claim it is.
I disagree pretty hard with this – for instance I've recently needed to dig into the code for the Gradio library, and when PRs are like https://github.com/gradio-app/gradio/pull/3300 (and the merge commit's message is what it is) it's hard to understand why some decisions have been made when doing `git annotate` later on.
I don't think you're in disagreement. Parent is saying that it is customary nowadays to have crap commit history. You're saying that crap commit history is problematic from a s/w engineering perspective. Both are true.
I’ve worked on a project with devs that write crap commit messages. It’s far easier to just commit to squash merging and requiring good PR titles and descriptions (which become the commit message following the squash).
Especially when you have multiple people working on a shared branch rebasing can be quite painful. The most common example of this is when sharing a branch with a QA.
My favourite tool for this is the blame command in Magit (the Git client for Emacs). You can cycle between styles; one shows the commit message as a kind of header in-line with the code for instance. Another just shows a faint rule between lines that were changed in different commits. Then, one can press return to show the specific commit that the line was changed in. Well worth a try if you haven't already!
One sitting is a lot for a project you've forgotten about, lost context on, and are no longer excited by, especially assuming you have a lot of other stuff going on in your life. And then ya, everything else you said. And there are no tests.
This is it entirely - imagine suddenly being asked to clean and organize a house you haven’t lived in for ten years. It may take you a day just to get back up to speed on the code.
(Interestingly enough Knuth praised literate programming for TeX as the reason he could get back in and fix bugs after an almost ten year hiatus - where parts of the code he had not looked at in 40 years.)
Some professional cleaners do indeed do their job on properties they might never have seen before; maybe the programming equivalent is the external consultant who is expected to quickly identify bugs in unfamiliar codebases.
>> I’ve become a lot more sensitive to the impact of commit history itself
This was something @whitequark taught me. Having a clean commit history is very important when looking back, and we we look back more often than most people thunk. Self contained commits with good messages is important. I'm still not great at concise descriptions but trying.
+1 to this. I let my team pick one: very small PRs that get squashed when merging, or bigger PRs with well-separate commits and linear history that get merged normally.
Having a bunch of “fix” and merge commits in the main branch history is terrible.
So with libraries that are consumed by third parties, it's rarely about the number of lines of code. Since you don't really have a definitive list of all usages of your code, things like the changes to `_money_from_dict` making things nullable mean you have to consider all ways in which that might blow up.
And like "the public API is the public API, the private API everything goes" is easy to say, but it's so easy to just break other projects with these kinds of changes. This makes it very hard to move forward with certain kinds of "bug fixes" in projects.
That being said, it's not that that PR is "hard", but it's hard to say "merging this is A-OK" instantly. Hours? I dunno, but I would definitely add some tests and try to figure out how to write code that breaks with those changes.
If I were managing that project at the time, and had motivation, I'd definitely do a lot of cherrypicking, get all the "obviously won't break anything" changes merged in, to leave the problematic bits in for a focused review. Again, this all might be under an hour, but sometimes you look at a thing and are like "I don't really want to deal with this, I have my own life to deal with."
At a higher level, the biggest problem with these kinds of libraries is having the single person who can merge things in, who can hit the "release" button. There's a lot of projects that interact with Django that have heavy usage, and survive mainly thanks to random people making forks and adding patches that properly implement support for newer Django. At $OLD_JOB we used a fork of django-money (including a lot of patches to fix stuff like "USD could be ordered with JPY", pure bug generators). It was very easy to add patches because, well, we had our usage and our test suite and no external users. It's great, but it's also important to try and get patches upstreamed when possible (and we did for a lot of projects).
Echoing your experience, I too worked on a project that was stuck on angular.js 1.3 when 1.6 was about to be released. We took a look at some migration guides and didn't see that many changes we had to make, so we tried updating it.
Everything broke.
Investigating, I realised that one of my long departed predecessors forked angular-bootstrap and made a few small changes to it. The problem was that the that library was tied to angular.js 1.3. To update angular.js, we had to update the library. To update the library, we had to remove all the changes in which would break large parts of our UI. The project was already in maintenance mode by that time and we decided to just leave it as is. I spent the next month converting it from coffeescript to es6.
I had exactly the same thing last year with a previous long departed staff member having forked django-flex-fields into their personal GitHub and having made substantial changes. Porting to Django 3.2 then became a huge and costly project in itself as a result.
>So with libraries that are consumed by third parties, it's rarely about the number of lines of code. Since you don't really have a definitive list of all usages of your code, things like the changes to `_money_from_dict` making things nullable mean you have to consider all ways in which that might blow up. And like "the public API is the public API, the private API everything goes" is easy to say, but it's so easy to just break other projects with these kinds of changes.
Strongly typed languages, a well-designed API, and senantic versioning can make this problem largely disappear.
If you fix a bug, this can break existing code. This is a fact of life. Changing performance characteristics can generate downstream problems! You have to consider this stuff seriously.
Here the change introduced nullability. In another universe the function would already be nullable but the conditions in which a value becomes None changes. A spec can be changed, for the better, and still be bug generating if people just upgrade. That is not captured by most type systems, and there aren’t that many great production web apps running on Idris.
But ultimately the reality is that people have a lot of flexibility with Python projects in general. It’s great, and libraries that are aware of this, well… they write it in release notes. They also have open repositories to enable actual code diffs. It’s non-zero amounts of work but it’s there.
There is a theoretical universe in which a static language with well-designed libs provide good aesthetics to make developing certain software easy. Meanwhile even as a big functional programming lover I still reach for Python because I can get work done because the libraries are in fact well designed, and the code is easy to work with, and I can fix issues quickly. As a user it’s great, as a library maintainer I gotta apply some more care. Could be better but it's alright
Largely, but not quite all. E.g. there's only so much information about runtime behavior you can capture in types, and then almost no one captures even half of it in practice. Doing this right requires a lot of experience, plus good knowledge of the problem domain, and it tends to both bloat and ossify the code.
It's easier if you're doing version 2 of a well-tested, scope-limited library, and thus can afford to do some holistic design. But the trend in our industry is to develop iteratively, small releases done often, everything kept at 0.x.y perpetual beta, and on the off chance your project grows old enough to warrant 1.x.y version, it has so much evolutionary baggage that you'd have to rewrite it from scratch to get proper typing in (which, of course, is against the zeitgeist).
How so? The API still breaks if it's strongly typed, the breakage is just easier to spot by downstream consumers.
And note that saying "release as a new major version" is not making the problem disappear, it is simply choosing not to solve the problem, but with a warning on top.
That has not been my experience. Nor Hyrum Wright's (of Hyrum's Law fame [1]), nor Randal Munroe's [2].
Note how things like performance characteristics leak through strong-typing, well designed APIs, and semantic versioning, in spite of non-guarantees around such characteristics.
One of the stories shared with me while working on OpenJ9 was of a customer that used stack overflows as a scheduling mechanism, and at some point they were compelled to update their JVM to a version that incorporated tail-call optimization, which caused their Rube Goldberg scheduler to stop working (by spinning indefinitely).
I would however argue that the existence of a user relying on behaviour that has explicitly been reserved as subject to change must not preclude development from rendering improvements to products. At some point the consumer needs to be on the hook for relying on a positive externality that they do not have a right to.
According to the article, the author no longer used this project. It was no longer front of mind. There is a massive difference between reviewing a PR when you are actively involved in a project and reviewing one when the project is in your past.
In that case, it’s perfectly reasonable to spend a couple of hours getting back into code you wrote a long time ago. If anything, taking that time is a big win for overall project stability.
It's not too bad, but in the context of reviewing for a project in your spare time, it can be draining. I totally understand why he didn't feel like doing it.
I've committed bigger just with a "minor changes" -message :D
But seriously: that code seems to be touching bits that really should have automated tests attached. If the tests pass, then I would feel more comfortable accepting the PR.
Depends on the bits they don't do. I get contributions from time to time. Yet can look small , but I need to add tests, documentation, and so on. It can easily take an hour for just one method added, or whatever.
If the code is just fire and forget, then fine. If it's part of a bigger system with rigorous standards then 300 lines can take a day or more to "merge" in.
It's certainly not hours of work to review it... or maybe it is since it's financial? Either way, "it was in the script" as my wife and I say about corny movie moments. It made for a good article.
For this PR in particular, seems like a lot of it is formatting changes, so you’re right, may not have been a big deal in practice. But I wouldn’t take the statement so literally. For an open source project, every PR can contain unbounded toil for no pay.
It isn't that big at all, but when you maintain something you really don't want to maintain, you're over anything but the tiniest changes such as updating the copy. Ask me how I know. lol
If I was doing a proper review of a PR that big and making sure you understand how everything works that would easily take multiple days and generate 10s-100s of comments.
In reality I would flat out refuse to review it. Even 1k is very large without being broken down.
The only time I see PRs that big at work are cleanups (E.g. deleting whole directories), automated linting changes across the whole codebase and large structural refactors (E.g. changing directory structure).
Almost every time I’ve reviewed a 1k+ LOC PR, even if it’s from a really experienced and good engineer, it has introduced a bug. Obviously I’m not gonna say it at work, but changes that big are too hard to properly review and consider all side effects and gotchas
To use an analogy: if you wanted to reupholster my car seats, or spice up the radio panel, sure knock yourself out - I'll come check when it's done. But if you were to as much as think about tightening or loosening a single screw anywhere near the engine block, believe me I will be paying very close attention to what you're changing.
Best team I've ever worked with. 5k was on the larger side, sure, but technically difficult tasks often just can't be split into a series of smaller pull requests.
I am always amazed that many people with significant experience are so resistant to this idea. For what it's worth, my experience matches yours entirely: many significant changes can't be meaningfully committed in small chunks, they only make sense as an all in one.
And even more so, I've often seen the opposite problem: people committing small chunks of a big feature that individually look good, but end up being a huge mess when the whole feature is available. I hate seeing PRs that add a field or method here and there (backwards compatible!) without actually using them for now, only to later find out that they've dispersed state for what should have been one operation over 5 different objects or something.
What changes can't be committed in <5k LOC? That's a shit ton of code. If you can't break that down into smaller shippable chunks there's probably something wrong, or you're building something extraordinarily complex.
It's definitely overall quicker to ship like this, but there are tradeoffs. You are effectively working independently from the rest of your team, there is no context sharing and everything is delivered at once after a longer period of time.
We're performing atomistic simulations. The first edition of the code stored each atom on the heap and had a vector full of pointers to the individual atoms. Obviously this would obliterate the cache, so I crafted a PR to simply store all the atoms in a single vector. On its own, that was a one line change, but it was also a very fundamental change to the type system. Everything as simple as
Atom* linker = atoms[index];
linker->x += 1.57;
Suddenly had to be
Atom& linker = atoms[index];
linker.x += 1.57;
If I didn't make those corresponding changes, the code wouldn't type check and the build would fail. I think the final PR came out to about 17 kLOC.
Obviously if you make a change to something like your type system it's going to generate a very large diff, but you also aren't going to review the full diff.
You're just going to make the change with find+replace or some other automation then write in the PR description "I made this change to the type system". No one is actually reviewing 17k LOC.
Actually the example is pretty good for the kind of problem I'm talking about.
Let's imagine that instead of optimizing the pointers to in-place structs, we were taking the optimized program and adding support for dynamically allocated atoms because of some new feature for dynamically adding/removing atoms.
We could of course split the value->pointer 17k line change into a single PR. But, that PR is only doing a pessimization of the code. On its own, it makes no sense and should be rejected. It only makes sense if I know it will be followed by the other feature, and even then, I would have to see the specific changes being made to know if this pessimization is worth it.
And if it got committed to the main branch, in preparation for other PRs that depend on it, the main branch is no longer in a releasable state, since it would be crazy to release with a performance penalty and no feature gain.
So, the right way to push this is as a single PR with a 17k-changed-LoC commit + the commits that actually use it. Of course, people would only manually review the other changes, but that's easy to do even if it's all in a single PR. And anyone looking back at history would clearly see that the pessimization was only a part of the dynamically-allocated atom feature, not a crazy change that someone did.
You specifically mention "big feature" in your original comment so it's confusing that this is a good example of the kind of problem you're talking about.
This is a very different situation than large PRs containing feature code. I think most people would agree that one large PR is the correct approach for this kind of situation.
Usually new features require modifications of old code, at least in my experience. If I came across as claiming it's likely for a feature to require 5k new lines of code, then I clearly communicated badly. But a feature coming with 5k lines of modified code, while rare, still happens several times a year in a large project in my experience.
This isn't what the op is talking about. They're talking about a net new feature that would span 5k lines. Your change is trivial compared to it, and frankly would earn approvals immediately without much thought (assuming the changes were already planned and talked about)
A new feature can easily involve the kind of modifications that they are mentioning. It's pretty rare for a new feature to exclusively involve new code, in my experience. And when it needs modifications in a deep part of the stack, the new feature will easily spiral into small modifications to thousands of lines of code.
In that case I also see "large" PRs, but I point the reviewer to "file X and then 4999 copies of file Y." When doing large no-change refactors, I submit standalone PRs and merge them quickly because no one will review 1k identical changed lines (e.g. if I change the location of /lib/ imported 600 times in the project)
What if the refactoring makes the code harder to understand when looked at in isolation, but is necessary for the rest of the feature to work? Why submit it in a separate PR without the context of why it's necessary?
I've experienced this changing the API for a primary memory allocator in a frequently updated code-base. Each location updated was perilous and it needed to be changed in bulk to avoid an endless war of attrition.
We have a small feature that was put together in a rush. It "works", but only works correctly for some simple cases, otherwise there are bugs everywhere. There were very few tests. We must redo the while thing, update the UX and add tests. Tell me how we can achieve that without replacing all existing code and add new tests that cover every use case in the same change.
Why do you have to do it all at once? You can't improve the codebase piece by piece?
Shipping all the changes in a monster PR is usually not a good option. One big reason why is that you do not create any value until the whole thing ships. If you ship piece by piece you can create small amounts of value every time you ship.
Also, if it's a "small feature", why is it 5k+ LOC?
We as a team sometimes decided to PR into a PR branch, not as meticulous as a PR to develop but there'd still be eyes on the code entering the branch. Especially useful when there are dependencies and/or different disciplines contributing to the feature.
As if you are reviewing 5k lines, though. It's either 90% whitespace changes that you completely skim over (probably missing the one bit that actually changed) or you just skim over it looking for a few patterns you don't like such as using a loop instead of Array.map or something.
When they say 5k lines, I'm assuming they mean 5k lines of meaningful code. 5k lines with whitespace changes, linting etc. would be impossible to meaningfully review
This is not at all a huge PR. Sometimes I make thousand line changes (or way over that), although in most cases it is just me working on a project. Changing a couple files (like in this PR) should be OK.
Do others share the sentiment that your reply is quite, sad to say that, arrogant?
What are you exactly trying to achieve by comparing the guy's "I'm busy, this is long" with yours or anybody elses? Moreover, what on earth has his job to do with the topic?
If they're able to play video games and not pay attention during the meeting then it's a waste of their time to be there in the first place. Stop wasting people's time with your meetings.
After those 30 years it's entirely possible that as a renter I have:
- been able to take advantage of moving freely for personal or economic reasons
- been able to move to a newer completely renovated apartment number of time for no costs other than moving my stuff instead of either paying for it or living in a house with a 20 years old kitchen
- been able to live without stress about all the responsibilities that come with owning a house, being in debt and tied down
- come out with enough money from my S&P500 that I started 30 years ago to decide to settle down somewhere and buy the house or continue as is with bunch of money saved up
I am not saying that this is what will always happen or that owning a house is strictly worse but the "rent is money down the drain, mortgage payments is money you keep in the end" angle is way too naive.