A large number of Fossil positives are related to not having rebase. It feels like this is a huge concern for functionality that many people, do not use that often. The last time I used rebase at a job was maybe 5 years ago?
Other than that my bigger gripe is when I read something like this:
> Git strives to record what the development of a project should have looked like had there been no mistakes
Git does not strive to do this. It allows it, to some degree. This is not the same thing at all and is basically FUD. I would say the debate is ongoing as to the value of history rewriting. It's probably a tradeoff that some orgs are willing to leverage and Fossil is masking that they allow less flexibility in workflows as an obvious advantage, feels slimy.
Git gets its bias from the Linux kernel development.
When you're sharing your source changes with external people, who need to review your code, it just makes sense to present it in a clean, logical progression of changes. To wit, remove unnecessary noise like your development missteps.
And it's only in that context that the emphatic call for history rewriting is born. Meaning, you can use all the power of Git to record all your changes as you proceed through development, and then rebase them into something presentable for the greater world.
It's also useful in code reviews in general - I don't care about your development noise, and every single person in the future does not need to read it to understand the final result either. Rebases solve that: present a coherent story for easy understanding, rather than the messy reality.
When you're purely local, sure - do whatever the heck you want. Nobody cares. But messy merges are rough for collaboration, both present and future.
(rough, not fundamentally wrong, to be clear. It's just a friction tradeoff, dealing with human behavior has next to no absolutes)
My largest problem about rebasing a branch onto master as a single commit is that it becomes harder to do provenance.
When the tip of my branch gets built (1) and the merge target is not ahead (2) I can actually retag the output of the build of the branch as what is now on master.
When you do a squash, while the default commit message contains the SHA, it no longer carries a semantical meaning. It's just a string of text.
It feels like overindexing on git as a source of truth for the iterative development process itself is just bikeshedding. Do whatever you want to do locally, then squash your commits into a single unit change. Document that comprehensively in your commit message for that squashed change. If there was some profound learning that feels like it needs rebase history for, just explain it narratively.
Perhaps a more contentious take: rebasing doesn’t bring any real value. To the original comment above, I would say a significant percentage of teams never use rebase and drive business value just fine. I do not think there exists and evidence to suggest teams that use rebase over squash merging are in some way higher performing. Rebase is something that some people’s obsessiveness compels them to care abut because they can, and then retroactively justify their decisions by suggesting value that isn’t there.
Again, Git gets its bias from Linux kernel development. You can not present your entire work as a single unit change to sub-system maintainers for inclusion in the mainline. You can use the power of Git to present it in a digestible way that aids understanding, and allows piecemeal selection of acceptable bits, and rejection of others.
This is clearly not relevant to everyone, but to suggest there is no value at all, is laughable.
> You can not present your entire work as a single unit change to sub-system maintainers for inclusion in the mainline.
If we take away “this is how it’s historically done” or “because I know how these maintainers act and they would not accept it” - why not? From what principles is the Linux kernel special that standalone incremental units of change are inappropriate or undesirable?
> allows piecemeal selection of acceptable bits, and rejection of others
Sure, is a good thing, but doesn’t have much to do with git and can be done without rebasing.
> From what principles is the Linux kernel special that standalone incremental units of change are inappropriate or undesirable?
Human limitations. You are simply better able to digest small incremental changes, instead of one big blob. This is not controversial and i don't understand why you would even raise this as a point.
> Sure, is a good thing, but doesn’t have much to do with git and can be done without rebasing.
It does have something to do with Git. Git has tools to facilitate it. And that is the case because Git was born out of Linux kernel development where it was the norm, and necessary.
If you want to use another tool to achieve the same result, nobody is stopping you. Or if you don't want to do it at all, that's fine too. But the simple fact of the matter is, you must do it today if you want to participate in kernel development.
> Human limitations. You are simply better able to digest small incremental changes, instead of one big blob. This is not controversial and i don't understand why you would even raise this as a point.
Because I agree with you on your assessment of human limitations, and rebase doesn’t change this. If I need to merge 500 lines of code over 20 affected files, and none of that can be merged individually while retaining functionality, it doesn’t matter if I split that into 50 10-line commits if I need to merge in the monolith at once, which means understanding the set of changes as a monolith, regardless of git history.
If for some reason any of this can be merged in as standalone, you just do that instead. This is what I mean by bikeshedding. Rebase culture is purely preference. There is nothing it offers that is not solved equally as well by potentially less complicated alternatives. It’s not the wrong way to do things, it’s just not also objectively right, and creates more work than it claims to solve.
> it doesn’t matter if I split that into 50 10-line commits if I need to merge in the monolith at once
That is a misunderstanding of how to use this feature. It is not meant to break changes
down into useless divisions. It is meant to allow the grouping of changes into logical
units. Logical units that help human comprehension.
This is very important when communicating with people who have never seen your code
before. It allows you to include a narrative description (commit message) with each
logical group of changes that is directly connected to the source code implementation
of just that descriptive piece. It also allows you to connect a chain of those logical units into a progression toward a greater, cohesive goal.
You may dismiss all this as irrelevant to your particular environment, and that is fine.
But Git provides tools that are directed toward it, and they're quite powerful and useful
for those who understand and use them correctly.
> to suggest there is no value at all, is laughable.
It can have no value at all to common workflows. Usually, this kind of singular change consideration is done at the PR level in another tool (github), which is divorced from git. Being able to present missteps/demonstrate specific commits where something didn't work (without another developer having to write the scenario) has utility that I have leveraged.
> Being able to present missteps/demonstrate specific commits where something didn't work (without another developer having to write the scenario) has utility that I have leveraged
Fortunately, you're able to include anything you think is relevant. If you think a change is worthy of inclusion, include it. But there are clearly things that are just silly mistakes that provide no such value, and cleaning those up as a courtesy for the person who has to review your code, just makes sense.
> But there are clearly things that are just silly mistakes that provide no such value, and cleaning those up as a courtesy for the person who has to review your code, just makes sense.
Maybe nobody cares about your missteps, true. What about less senior developers? Is there a learning opportunity both ways? Yes. The history pepper doesn't matter either way. There's a little value to think about that in the workflow I described, so we don't toss it (not that anyone can make you expose it).
Having a clean history, where every commit is capable of being compiled, is quite nice. This will keep your CI happy and allow you to more easily use git bisect to determine when a bug was introduced to the codebase.
> There's a little value to think about that in the workflow I described
Sure. Git is flexible and doesn't require you to follow the workflow for which it was originally designed.
> Perhaps a more contentious take: rebasing doesn’t bring any real value.
I strongly disagree. I rebase feature branches on a daily basis and it's a must-have feature for anyone who works on feature branches that you want to keep updated and mergeable as fast-forward merges and ideally peel off small commits in separate pull requests.
Here's a small example of a very mundane workflow. I was assigned an issue where I needed to enable a component in a legacy project. I cut the feature branch and started going the "lean on the compiler" approach to fix blockers. Each individual blocker I addressed I saved as a local commit. Throughout the process I spotted a couple of bugs in mainline, which I also saved as local commits. Finally I got a working local build, but team members already merged a few updates onto mainline that created conflicts. I rebased my feature branch onto mainline's HEAD and fixed the conflicts in each local commit. Time to post pull requests for the fixes. I noticed a few of them were related so it would be preferable they went in before everything else. I did an interactive rebase to reorder local commits to move these bugfix commits to the start of the local branch. I squashed them, cleaned them up, and posted a pull request. The pull request was merged into mainline and in the meantime other PRs went in as well. I rebase the remaining commits in the local branch. Followed the same process for another bug. Rinse and repeat. Finally all I had left was the fix for the original issue. I rebased the remaining commits onto mainline, cleaned them up, and posted a PR. Done.
One ticket, around 3 PRs, and almost a 1:1 ratio of rebase-to-PR.
And here you are, saying rebasing doesn't bring any real value.
I think those who complain about rebase are overrepresented by the subset of Git users who barely go beyond the very basic features of checking out branches, pulling changes, and committing stuff. They have no idea how and why other features work, so they complain about things they know nothing about. When pressed about basic usecases, they fall back to trying to fill in the holes in their reasoning by arguing that workflows should be different or that other features are similar, while completely ignoring that features like rebase do the job and do the job very well and very easily.
Regarding rebase, it's been my experience that among many developers rebase has a mythical status. You're "supposed to" rebase, but no one knows the benefit of doing so.
It's a big downside of git being treated like some magical difficult spell. Same with exiting Vim, people treat it as way harder than it really is.
I tend to agree. I haven't used Git in a large project, but...why would I want to rewrite history? The project is what it is. What happened, happened. If there are a couple of weird commits, who cares? At most, maybe edit the commit messages to explain.
Because when I am developing in my local repo I have a stream of commits that go “adding xyz because abc is being a pain”. They’re informational for me as I progress through iterating on a feature, but when I’m ready to merge I really don’t want that mess polluting the global commit history. I may also be working on multiple things in parallel and want to isolate them from each other, both to keep a cleaner history but also for code review purposes.
There’s plenty of reason to use rebase, but if you’re fine letting the occasional mess slip into the history then it’s fine to NOT use it as well.
The one universal case though. Rebase before you submit a patchset for review. I don’t want to fight through merge conflicts to review a change, make sure it’s applied to the current HEAD before you send it.
Rebasing to squash commits or even split commits up before/during making a PR makes sense to me and I do it all the time just to clean up my mess. The order of development and the state of the repo over time isn't faked, this is just labelling and granularity.
What doesn't make sense to me is rebasing instead of merging. If master has a lot of changes and you want to pick those up, you can merge in which case history reflects reality - each commit has an actual state of the repo that you had on your machine.
Or you can rebase, in which case all of the commits on your branch now contain code that no-one ever had on their machine, not tested, never run, maybe it doesn't compile, maybe it's nonsense.
Both result in the same diff from master so are equivalent for submitting a patchset.
It's really hard for me to see the value of trashing your history like that. People like linear history but history actually is not linear sometimes.
Lots of criss-cross merges make it really difficult to follow history, making it less useful.
Note that merge vs rebase a false dichotomy. After rebasing you still have to merge your branch anyway, either fast-forward or with an explicit merge commit.
In the end, it's about commutating your changes effectively. The less noise there is, the better you can communicate. That takes effort from both sides, but many persons put all the burden on the receiving side.
Usually it's if you committed something you shouldn't have.
You still have to change api keys if you ever pushed, because things like github store orphan commits, but if you have something you can't change in there or you catch it before you push, it will at least go away eventually.
There are a few legitimate cases where you want to rewrite history.
1. Assume you are a user who is cloning the master/main branch and building it. If that branch contains your development missteps, then you are in for a world of pain. We as users always assume that the master/main Head is always buildable (excluding inadvertent mistakes).
2. If you are sending the commits as patches, then it makes sense to include a complete feature in a single patch. It will otherwise be very hard for the reviewer to make sense of any patch.
> I tend to agree. I haven't used Git in a large project, but...why would I want to rewrite history?
There are plenty of reasons if you're doing non-trivial tasks on local branches within a team. I've mentioned a common usecases I have, which is to reorder commits I make in local branches to afterwards peel them off as stand-alone pull requests.
If you're doing trunk based development, with continuous integration, then you're approximately always on a public branch, and rebasing is not very useful.
Generally you merge main into your branch to resolve the conflicts there, then push to make the PR. Sometimes it's easier to rebase, sometimes easier to merge your main. The frequency of one or the other being more useful/easier often influences the accepted workflow.
I keep coming back to fossil again and again, despite git having a huge pull because of the easy publishing and collab on github/gitlab.
Just the other day I was starting an exploratory project, and thought: I'll just use git so I can throw this on github later. Well, silly me, it happened to contain some large binary files, and github rejected it, wanting me to use git-lfs for the big files. After half an hour of not getting it to work, I just thought screw it, I'll drop everything into fossil, and that was it. I have my issue tracker and wiki and everything, though admittedly I'll have some friction later on if I want to share this project. Not having to deal with random git-lfs errors later on when trying to merge commits with these large files is a plus, and if I ever want to, I can fast-export the repo and ingest it into git.
It is extremely rare that I have a file over 100MB.
I also think it’s one of those situations where if I have a giant binary file in source control “I’m doing it wrong” so git helps me design better.
It’s like in the olden days when you couldn’t put blobs directly in a row so databases made you do your file management yourself instead of just plopping in files.
I like git. I don’t like giant binary files in my commit history. It’s cool that you like fossil, but I don’t see this as a reason for me to use it.
You didn't put blobs directly in the database because of annoying database limitations, not because there's a fundamental reason not to.
It's the same with Git. Don't put large files directly in Git because Git doesn't support that very well, not because it's fundamentally the wrong thing to do.
There should be a name for this common type of confusion: Don't mistake universal workarounds for desirable behaviour.
The fundamental reason had to do with how rdbms structured its pages of data and having arbitrary sized blobs directly in the record broke the storage optimization and made performance tank.
It was a design constraint back in the day.
I haven’t looked at this in decades, but I think now it’s all just pointers to the file system and not actually bytes in the record.
So it was fundamentally the wrong thing to do based on how databases stored data for performant recall.
But that’s back when disks were expensive and distributed nodes were kind of hard.
> I think now it’s all just pointers to the file system
It depends. InnoDB, assuming the DYNAMIC row type, will store TEXT/BLOB on-page up until 40 bytes, at which point it gets sent off-page with a 20 byte pointer on-page. However, it comes with a potentially severe trade-off before MySQL 8.0.13: any queries with those columns that would generate a temporary table (CTEs, GROUP BY with a different ORDER BY predicate, most UNIONS, many more) can’t use in-memory temp tables and instead go to disk. Even after 8.0.13, if the size of the temp table exceeds a setting (default of 16 MiB), it spills to disk.
tl;dr - be very careful with MySQL if storing TEXT or BLOB, and don’t involve those columns in queries unless necessary.
Postgres, in comparison, uses BYTEA as a normal column that gets TOASTed (sent off-page in chunks) after a certain point (I think 2 KiB?), so while you might need to tune the column storage strategy for compression - depending on what you’re storing - it might be fine. There are some various size limits (1 GiB?) and row count limits for TOAST, though. The other option is with the Large Binary Object extension which requires its own syntax for storage and retrieval, but avoids most of the limitations mentioned.
Or, you know, chuck binary objects into object storage and store a pointer or URI in the DB.
In the age of Large Language Models, large blobs will become the rule, not the exception. You’re not going to retrain models costing $100M to build from scratch because of the limitations of your SCM.
I fail to understand people that can't be bothered to empathize with other use cases than their own. Game development usually has a large number of binary assets that need to be in source control, does that sound like a reasonable use, or are they also doing it wrong?
GF is working for a startup doing a game. They were using git and dumped it because it just cannot deal. Also the content people found it annoying without providing any value what so ever.
That's not really true, is it? Surely Git does have an internal concept of diffing changes, specifically so it knows whether two commits can be merged automatically or if they conflict (because they changed the same lines in the same file).
> Surely Git does have an internal concept of diffing changes
Not in the data model. Packing has deltas, but they're not textual diffs, and they would work fine with binary data... to the extent that the binary data doesn't change too much and the delta-ification algorithms are tuned for that (both of which are doubtful).
> specifically so it knows whether two commits can be merged automatically or if they conflict (because they changed the same lines in the same file).
Conflict generation & resolution is performed on the fly.
Most binary files that people want to store in a VCS are stuff like .psd, .xlsx, .docx, and the like - data that's created by people by hand, but not stored as text.
Xlsx and docx are just zipped up xml text. You can store it as text if you like and I think there are many git modules to handle this. But the xml isn’t really that diffable so I don’t bother.
Not in gamedev where you can have hundreds of gigs of art assets (models, textures, audio...), but you still want to version them or even have people working on them at the same time (maps...). But that is a different can of worms entirely.
Indeed I have 3D assets in this case. Would this be done differently in an enterprise that has all kinds of tools to manage specialty workflows? Sure. Do I want to spend my days configuring and maintaining some binary blob / LFS storage system? No.
I’ve migrated a lot of projects from fossil to git eventually, but I dare say they never would have made it that far, had I started out with more friction, including fighting vcs tools.
You can equally say that git is for when you want to track changes. And then it's a failing of git.
Besides, what's the difference? It's a file. The contents changed. Git doesn't say anything at all along the lines of "30% or more different means it's not a good fit for git".
That seems like an implementation detail that could change tomorrow, at which point it could be perfectly fine to store large blobs in your repository, yea?
I completely agree Git is bad at this now, to be clear. I've watched single-file repositories bloat to hundreds of gigabytes due to lots of commits to a single 1MB file. But that doesn't seem like a design problem, just implementation.
Not sure if you were agreeing with me or not BUT I run into this often where people do not use the right tools and try to make one tool fit every CM scenario. SharePoint sucks but it has its place. Along with Artifactory/Nexus
Although blob storage work well for versioning, you have to make heavy use of the underlying proprietary API to get these versions, and I am not quite sure you can do more complex operations, like diff and bisect between those versions the way you could with git.
But that's my point: why can't a version control system be good for this as well? It's the same thing underneath. Why do we have to split these different use cases across different tools and hope a foreign key constraint holds?
That's a ridiculous claim. Can you really not think of a single situation in which it makes sense to keep track of big pieces of data alongside (or even instead of) source code? The fact that many VCS don't handle large binary data nicely doesn't mean there's never a good reason to do so.
It doesn't even matter if they can think of one; assuming your own use cases for software are everyone's is proceeding from false premises and is the sort of thing that leads to (and necessitates) "hacky workarounds" and eventually the adoption of better software we should've had in the first place.
Assume nothing about user's use cases. A VCS should not be imposing arbitrary limitations on the files it's indexing. It's like the old-school filesystems we (surprise, surprise) deprecated.
My problem with Fossil is that it is a "one solution for all problems". Fossil packs all solutions together while the Git ecosystem provides several different solutions for each problem.
When you want to do things that Fossil is not meant to do, then you're in trouble. I have no idea on how to do CI/CD and DevOps with Fossil and how to integrate it with AWS/Azure/GCP.
I find the whole ecosystem of Gitlab/Github, Notion, Jira and stand-alone alternatives like Gitea [1], Gogs [2], Gitprep[3] and others to be more flexible and versatile.
Unfortunately for git alternatives, the momentum behind git is in large part pushed by the "social network" aspect of GitHub.
In the past I used Mercurial, among other things, for my open source work. And various issue trackers of my own choosing. I am not particularly wedded to Git. But I keep getting sucked into GitHub these days.
To get publicity or outside contributions it's hard to avoid the GitHub trap. It's become a discovery service for open source (like Freshmeat.net back in the day), a common reference point for how to do code reviews ("merge requests") and issue tracking (even though it doesn't really do either all that awesomely), but most importantly it's become a place where people network generally.
I don't love that this is the case but it's hard to avoid.
> Unfortunately for git alternatives, the momentum behind git is in large part pushed by the "social network" aspect of GitHub
And there was a time everyone thought facebook wouldn't dethrone myspace, [something.js] wouldn't replace [somethingelse.js], and so on.
First mover doesn't mean a lot in software. The network effect you brought up does, but there'll be plenty of people who don't want to get caught up in that "trap" and git/MS-land to seed a decent alternative. (Why should your code discovery networking site be prescribing your choice in VCS, anyway?)
I agree with all of this, for sure, and I look forward to the situation changing. And I hope when it does, it does so in a way where the system has more than just Git as an SCM option.
I had hopes for bitbucket for a while, but it stagnated, and then Atlassian got their mitts on it.
Git is an absolutely abysmal industry standard and as far as I'm concerned is further proof of my theory that tech is lacking (and actively discourages) much-needed creatives from the field.
With them having more representation we would have replaced it years ago.
If Fossil is so against deleting commits, what do you do if you've accidentally committed sensitive information that cannot live in any form in the repo?
Fossil provides a mechanism called "shunning" for removing content from a repository.
Every Fossil repository maintains a list of the hash names of "shunned" artifacts. Fossil will refuse to push or pull any shunned artifact. Furthermore, all shunned artifacts (but not the shunning list itself) are removed from the repository whenever the repository is reconstructed using the "rebuild" command.
It is a problem in all decentralized systems. Once you publish something, there is no going back. Anyone of your peers can decide to leave with your sensitive data. That's also what make them so resistant to data loss.
Now if you know everyone who has a copy of your repository, you can have them run a bunch of sqlite commands / low level git commands to make sure that the commit is gone.
If you didn't publish anything, as someone else said, your best bet is to make an entirely new clone, transfer the work you did on the original, omitting the sensitive data, then nuke the original.
The difference seems to be that commits are serious business on fossil, and they encourage you to test before you commit. While on git, commits are more trivial, pushing is where things become serious.
Or you can just rebase to edit the commits and remove the secret file. If you're really paranoid you can run `git gc` vto ensure the object file is cleaned up also. If you're super paranoid, then you can do:
git hash-object secretpassword.txt
And check that hash isn't an object in the `.git/objects` directory.
> FURTHER WARNING: This command is a work-in-progress and may yet contain bugs.
Purging and shunning are two entirely different things in fossil. Shunning is for removing "bad" content and purging is very specifically for use with the "bundle" command (a rarely-used option for submitting "drive-by patches" initially conceived as fossil's counterpart to pull-requests).
That's a good point. Delete the repo and start over I suppose? W/ git wouldn't it possible to find and restore that info anyway? Guess it becomes what do you care about most at that point.
I once tried it and ended up losing the history for some weird reason. Maybe its a fixed bug by now, but if I don't want to use git, I will use mercurial.
I lost all of my changes the first time I used git, which was the same time I found the error "cannot merge because you have unmerged files" (cut to me yelling "I know, that's why I want to merge!").
I have not yet forgiven git for that, but I'll reluctantly accept that me not knowing how to use the tool is not entirely the tool's fault.
Also: I stand 100% by your alternative solution because Mercurial rocks.
>but I'll reluctantly accept that me not knowing how to use the tool is not entirely the tool's fault
I don't buy this. A good tool should do its job and stay out of your way. The amount of pointless knowledge I now have just to be able to use a version control system for my job still to this day annoys me.
Linus Torvalds isn't some infallible god, and it may be useful for linux kernel development, but we're not all linux kernel developers; and tools like VCSes, when designed well, should be unnoticed until the exact moment you need them, convenient and simple to use, and not get in your way or create problems for you where there weren't any to begin with. (Holy run-on sentences, Batman!)
In contrast, git goes out of its way to throw itself in your face at every opportunity, exacerbate your problems, and create a maze you either have to navigate precisely or just decide "fuck it" and do a copy/replace file trick just to get back on track with what you were actually doing.
The fact that people keep prescribing "just learn all its intricacies" or other band-aids (like the other "use with" software suggestions here) rather than even acknowledging it as a problem, to me, points to the lack of UX expertise in the field, and to stockholm syndrome.
(Which, funnily enough, is a problem things like git contributes to. It being one of the first things required to learn in CS, I constantly wonder how many of my peers might've switched out of the field given the mess it is, assuming CS wasn't for them. And, in turn, the breakthroughs we might've missed out on having earlier.)
Tools should be simple and usable, not throw up arbitrary barriers to entry.
It's also hard to go through the trouble of onboarding onto a VCS other than Git given a) how ubiquitous it is even and specially in free hosting services, b) Git alternatives don't offer any compelling feature other than contrarian bragging rights for not using Git.
Lots of people are saying that having large files in a repo is wrong, bad, bad design, incorrect usage.
Forget that you know git, github, git-lfs, even software engineering for a moment. All you know is that you're developing a general project on a computer, you are using files, and you want version history on everything. What's wrong with that?
The major issue with big files is resources: storage, and network bandwidth. But for both of these it is the sum of all object sizes in a repo that matters, not any particular file, so it's weird to be harking on big files being bad design or evil.
I did just over a decade in chip design. Versioning large files in that domain is commonplace and quite sane. It can take wallclock days of processing to produce a layout file that's 100's of MBs. Keeping that asset in your SCC system along side all the block assets it was built up out of is very desireable.
Perforce handled it all like a champ.
People who think large files don't belong in SCC are...wrong.
I occasionally used to start a sync, go get coffee, chat with colleagues, read and answer my morning email, browse the arxiv, and then wait a few more minutes before I could touch the repo. In retrospect, I should have setup a cron job for it all, but it wasn’t always that slow and I liked the coffee routine. We switched to git. Git is just fast. Even cloning huge repos is barely enough time for grabbing a coffee from down the hall.
I mean "massive resources" is just de rigeur across the chip industry now. The hard in hardware is really no longer about it being a physical product in the end.
> Lots of people are saying that having large files in a repo is wrong, bad, bad design, incorrect usage.
I don't think that is true. You do see people warn that having large files in Git repositories, or any repository that wasn't designed with support for large files in mind, is "wrong", in the sense that there are drawbacks for using a system that was not designed to handle them.
Here's a historical doc of Linus Torvalds commenting Git's support for large files (or lack thereof)
> Forget that you know git, github, git-lfs, even software engineering for a moment. All you know is that you're developing a general project on a computer, you are using files, and you want version history on everything. What's wrong with that?
How is it not bad design? Let's say you are working in a team. Would you really want your colleagues spending a significant amount of time cloning your artifacts? Your comment is also not consistent with forgetting that one is not a developer. Even if it's my grandma, she's not gonna want to wait for 1hour to download a giant file from VC assuming she knows what a VC is.
Large blobs can go into versioned object storage like GCS or S3 etc
In Subversion at least, you'd do a partial checkout. If you don't need a particular directory you just don't check it out. If you lay out your repo structure well there's no problem. It was incredibly convenient.
I've tried many different SCM over the years and I was happy when git took root, but its poor handling of large files was problematic from the beginning. Git being bad at large files turned into this best practice of not storing large files in git, which was shortened to "don't store large files in SCM." I think that's a huge source of our availability and/or supply chain headache.
I have projects from 20 years ago that I can build because all of the dependencies (minus the compiler -- I'm counting on it being backwards compatible) are stored right in the source code. Meanwhile, I can't do that with Ruby projects from several years ago because gems have been removed. I've seen deployments come to a halt because no startup runs its own package server mirror and those servers go offline or a package may get deleted mid-deploy. The infamous leftpad incident broke a good chunk of the web and that wouldn't have happened if that package was fetched once and then added to an appropriate SCM. Every time we fetch the same package repeatedly from a package server we're counting on it having not changed because no one does any sort of verification any longer.
SCC systems that handle big files don't suffer from the "you have to clone all the history and the entire repo all the time" problem that git suffers from. At least Perfoce doesn't...
git has its place but it's really broken the world for how to think about SCC. There are other ways to approach it that aren't the ways git approaches it.
When you make a video game you want version control for your graphics assets, audio, compiled binaries of various libraries, etc. You might even want to check in compiler binaries and other things you need to get a reproducible build. Being able to chuck everything in source control is actually good. And being able to partially check out repositories is also good. There is no good technical reason why you shouldn't be able to put a TB of data under version control, and there are many reasons why having that option is great.
The versioned object storage solves nothing. If your colleagues need the files, they're going to have to get them, and it's going to be no quicker getting them from somewhere else. Putting them outside the VCS won't help. (For generated files, you may have options, and the tradeoffs of putting them in the VCS could be not worth it. But for hand-edited files, you're stuck.)
If the files are particularly large, they can be excluded from the clone, depending on discipline and/or department. There are various options here. Most projects I've worked on recently have per-discipline streams, but in the past a custom workspace mapping was common.
> Would you really want your colleagues spending a significant amount of time cloning your artifacts?
Not just the artifacts, but their entire history. That is a problem that Git has out of the box, but there is no reason it needs to work that way by default. LFS should be a first class citizen of a VCS, not an afterthought.
Git is designed with a strong emphasis on text source and patches. It simply isn't designed for projects with large assets like 3D animation, game dev, etc. Having said that, solutions like LFS, Annex and DVC (not git-specific) work really well (IMO). If you don't like that, there are solutions like Restic that can version large files reasonably well (though it's a backup program).
This is an example of a more generic problem. We adopt some principle or practice for rational reasons, and then as a mental shortcut conflate it with taste, aesthetics, cleanliness. But no software or data is 'dirty' or 'ugly', we feel it so because of mental associations, but intuition is unreliable -the original reasons may not apply, or may be less important.
> These additional capabilities are available for Git as 3rd-party add-ons, but with Fossil they are integrated into the design, to the point that it approximates "GitHub-in-a-box.
I’ve not used fossil, but I appreciate this idea.
Sure, it’s not unixy, but maybe a VCS reasonably demands such features, and today, it’s not as though these are crazy advanced or complex features.
To clarify the actual benefit: this means that tickets, etc. are also distributed, i.e. available and backed up locally with every contributor, and not dependent on lock-in to a single vendor like GitHub.
Edit: and yes, of course there are downsides as well. It's up to you to weigh them against each other.
As the git people love parroting of its myriad kitchen sink commands, "if you don't like it you don't have to use it".
I'd rather have a single integrated tool than the shopping list of "just use X and Y and Z" people are prescribing in this thread for getting the so-called git ""ecosystem"" ""working"".
Not that I'll necessarily use all the added features; but the ones I do want being integrated is absolutely relevant to my interests.
Versioning/distributing tickets is indeed useful; but can't this be "implemented" in git already, by defining a file-based format (think, issue/$YYYYMMDDHHMMSS-$title.md, but the variants are endless), and versioning those files?
A first drawback I can think of is that this would probably require an additional layer for non-tech people. I haven't had the opportunity to use Fossil, so I am clueless regarding the kind of UI they propose, but I wouldn't be surprised that they actually solve this, considering the extensive list of features.
Ah, I probably wasn't clear enough: my point was that the core of the feature (i.e. omitting non-tech users) can be implemented so cheaply that it seems a bit superfluous to require a built-in implementation.
However, Fossil's documentation[0] has some answers:
Some other distributed bug-tracking systems store tickets as files within the source tree and thereby leverage the syncing and merging capabilities of the versioning system to sync and merge tickets. This approach is rejected in fossil for three reasons:
1. Check-ins in fossil are immutable. So if tickets were part of the check-in, then there would be no way to add new tickets to a check-in as new bugs are discovered.
2. Any project of reasonable size and complexity will generate thousands and thousands of tickets, and we do not want all those ticket files cluttering the source tree.
3. We want tickets to be managed from the web interface and to have a permission system that is distinct from check-in permissions. In other words, we do not want to restrict the creation and editing of tickets to developers with check-in privileges and an installed copy of the fossil executable. Casual passers-by on the internet should be permitted to create tickets.
Point 3. indeed requires additional software;
Point 2. could be solved by storing the tickets in a distinct repository.
Regarding point 1., as far as I understand, a check-in is an atomic series of modifications to the repository database, so it seems there is a strong correlation between the tickets and the "commits"? If so, why not, but I'm not sure why this is necessary either.
Not sure how that's relevant, but I have ~150 repos locally and less than 10 on github. I also like the unix philosophy of having tools that do one thing and do it well.
"One thing" depends on how you squint, though. If you view "version and track text information" then it makes sense to store tickets, wiki, code, yadda under the same tool.
I'm primarily a command line user, I don't use explorer to view, rename, move, copy files etc. Like I said, unix-y. If you think "manage computer stuffs" the OS should bundle all tools anybody could ever want.
And yet you don't use different file systems for small files vs large files, for text vs movies. One assumes that a filesystem can handle all of that kind of data. But one layer above we don't want all those under a unified interface.
> And yet you don't use different file systems for small files vs large files, for text vs movies.
Um, yes, I do. Not that it matters anyway, filesystems are not optimized to handle source code to begin with, they deal well with sectors/blocks and do that well.
I like the “separation of powers” between git and GitHub functions (and gitlab).
It’s nice to be able to start git repos locally and only push to GitHub when I need to.
I also use gitlab quite a bit and it’s so clean to be able to pull from gitlab and push to GitHub, or vice versa. I don’t need any utilities, I don’t need any special workflows.
Git is sort of like a protocol with gitlab, GitHub, and others built on top.
If GitHub owned git then it would be so tightly coupled and suck.
I don’t think a VCS demands features of chat and the others. And the evidence provided by the millions of users seems to support that.
It’s not that chat and wiki are super complicated, it’s just more stuff. Implementing an alarm clock isn’t complicated either, should VCMs have alarm clocks as well?
Just a minor example that irritated me today: try to look at a graph of past commits. In GitLab, it is nicely presented. In GitHub, it is nearly useless.
To me it feels like, rare and smart ones, trying to penetrate giant markets with all in one solutions. I'm not even mad. But find it hard to comprehend who would make such change. Probably they want to test their patience and expect considerable userbase in 30 years, finally monetize it.
I’m of two minds. I like the unix philosophy, and I feel like an all in one tool (probably written in C) is just a bigger attack vector. But I also like that there are alternatives to the git monoculture. I think the focus on convenience for common dev team functions is rather nice.
> I don't want my version control system to be a wiki. Or a chat app, or any of that.
Then just don't use those features. There is no performance price for it.
One of the nice things about Fossil having a ticket database is that you can use it as a distributed, version-controlled SQLite database to store anything. You can even use it as a JSON database if you want. And the forum, well, it's lovely to have it there even if you rarely use it. Because sometimes if you're working with someone else it comes in handy - some conversations really should be kept with the code.
Didn't know about fossil. It's unfortunate many people are saying they stopped reading at the "fossil is everything into one" part because It still looks an interesting way of doing vcs.
> 95% of the code in SQLite comes from just four programmers, and 64% of it is from the lead developer alone. The SQLite developers know each other well and interact daily. Fossil was designed for this development model.
They propose it as a way to do development with few members who all know each other, rather than making an open repository for anyone on the Internet to contribute.
I want to try it now and see what does it bring on the table and whether it's a viable alternative to git in this regard.
I haven’t used Fossil but I have used gittrac, cvstrac in git mode. Cvstrac was what SQLite used before fossil, basically wiki and issue tracker built on top of SQLite, just add CVS, SVN or Git. A very small system compared to Gitea or other forges, but incredibly powerful thanks to SQL and very productive. Fossil is just the logical continuation and shares the same clarity of design and elegant implementation of SQLite.
Can fossil be better than git for a solo personal project or does it add too much stuff on top that's useful once you have a few people working together and it's much simpler to just git commit?
I always find it weird when someone says this. Does Git do only one thing? Sure, for the appropriate definition of "one thing". But then so does Fossil if you define the "one thing" to be managing a software project.
Wouldn’t it be great if my car also brewed coffee?
I drink coffee every day on the way to work.
I’m always in a hurry. It would be convenient to just build that in.
I always drink coffee while driving, I may as well just build it in.
Cut to a month later because my car won’t start because I’m out of coffee grounds.
This is how fossil seems to me.
I probably have hundreds of repos with just versioned files. That’s it. I don’t want an issue tracker and chat and everything else in my git.
And when I do, I can trivially push my repo to GitHub or gitlab or sr.ht or whatever.
Comically, I’d probably use the fossil service if it supported git.
Anyone else remember when Google Code only supported mercurial? And they talked about how much better mercurial was than git. And they were right. And no one cared.
I hope fossil sticks around because I like more services.
But if I was interviewing for a company and they said that I must use fossil, I probably wouldn’t work there. (This is similar to if they said you must use TFS or something else weird)
I’ve used so many version controls over the years (vss, cvs, subversion, clearcase, perforce, bitkeeper, tfs, mercurial, manual file system hacks) and I need a really good reason to use something other than git. Making it more complex is not a reason I accept.
Fossil's source control doesn't break if the issue tracker or wiki is misconfigured or disabled. Just like your imaginary car-with-a-coffee-machine would not fail to drive if it were out of coffee grounds.
Saying "I don't want my version control to break because of misconfigured wiki" would have already been a straw man, adding this BS about cars making coffee just makes it more insane of an argument.
I disagree. Im sure fossil designers don’t want the whole thing to break if the wiki breaks, but have you tested it? I don’t know what dependencies it has or how well it’s designed. Im not worried about it being turned off. I’m worried about a bug in the wiki portion killing the “important” version.
But my argument is that I don’t want my important thing I care about doing unimportant things I don’t care about.
I love wikis and I love coffee. But I don’t want to have to install a wiki just to use source control.
I love used enough systems where stuff breaks because of a particular feature I didn’t know about and certainly don’t want (atlassian) to be wary of things.
And while the car / coffee maker allegory is funny, I wouldn’t be surprised if a car does break in the near future because some dumb feature can’t phone home and validate. Or the battery died from some process that kept running and was completely unnecessary.
This is mistaking the teams that design cars and most software, for the team that designs SQLite and Fossil.
Most software development shops are low quality dysfunctional politics-ridden heavily coupled messes using tools designed for a use-case that is only vaguely similar to the tasks at hand. For example: git, Jira, Confluence, GitHub...
In such an environment having one tool that does everything is a recipe for disaster: everyone knows all tools are shoddy and/or misapplied, and can only barely do their own job, they should not touch any other task.
SQLite and Fossil are in a completely different world of hard-core quality focus...
> And when I do, I can trivially push my repo to GitHub or gitlab or sr.ht or whatever.
And not the wiki, the issues, the releases, the forums, the website, and all that stuff that is probably needed once your project starts to involve a few people.
First, those are bolt-ons and I don’t actually want them tied to the source code management system.
Second, for my repos, I use markdown directly in the repo instead of a wiki. This works better for me because the version history is in the repo and for wikis the author is important context for the value of the information.
Third, I build my website using an ssg that builds off my repo. Typically this is Jekyll scripts that build out fine on GitHub pages or gitlab pages or whatever. And I can move to any host I want. I don’t want to couple my project’s web site to my source code host.
These are “solved” features as far as I’m concerned and these actually bring negative value for me. I don’t want to worry about what kind of forum functionality my scm provides. And I certainly don’t care about pushing my forum from one server to another.
> First, those are bolt-ons and I don’t actually want them tied to the source code management system.
Anecdote: when Richard first proposed the /chat feature in fossil i was highly skeptical about its utility but (as the fossil project's "JS guy") wrote it anyway. Now, almost 3 years later, we've been using chat 24/7 across multiple fossil-hosted projects and can't imagine doing without it. The majority of the sqlite project's coordination happens via fossil's /chat.
Yes, it was bolted on, but it's also become indispensable for us as a feature.
The mistake is assuming that fossil is simply a source code management system, it is not. Fossil is closer to a collaboration system around source code.
If you can live your life without it, good. But if I want to use a GUI with links instead of markdown files in a repo from which I can't click to go to the next article, an ssg and a dependency on Github/Gitlab/whatever host I store my code in, and other tools, then fossil does it all for me right from a single binary that will be the same for everyone.
Right. It’s a careful balance of what to include and what not to include.
I don’t know the exact dividing line for “do one thing” but I think that modularity and composability is important.
I like having an AC in my car. But I don’t want AC to be a dependency for my car.
I like chaining tools together, but work to minimize the required dependency among tools.
I think in principle, it’s good if you can install components separately. So for me git only doing version control is good. If they added a chat feature, I would not want that. If they had a chat module that I could optionally install, maybe.
> Right. It’s a careful balance of what to include and what not to include.
(A long-time fossil dev here...)
The single most important criteria for new features in fossil is, quite simply, "is it useful to fossil's own developers?" Countless times, Richard (the project lead) has found a personal SCM itch in the sqlite project and, a few hours later, committed a feature to fossil to handle that.
Fossil is, and always has been, first and foremost, a tool to manage the sqlite project's source code, and sqlite's development is still a primary driver for new features in fossil. None of the features are sqlite-project-specific, but many of the derive from the needs of that project.
Fossil doesn’t have a dependency on chat, forum, wiki or issues.
They’re included, part of the same product. But if you don’t want it, then you don’t have to use it. Just like you don’t have to turn on the radio or AC in your car. Just like all the email related stuff in Git.
And, just because they’re included doesn’t mean they detract the value of the other stuff that is there. A car without an AC isn’t better to drive than a car with an AC.
And, just like a car comes with a radio and AC because you likely want those conviniences, fossil comes with issue tracking and wiki because you’re likely to want that for your project. Don’t have to use it, but it’s there if you do.
> And, just because they’re included doesn’t mean they detract the value of the other stuff that is there.
I think this is a philosophical preference.
I think they do detract because they make the system more complex. And fossil developers work on that instead of the think I want. So there’s a cost there.
Having an AC and radio in a car increases its price. Not by much but by something. If designers didn’t work in the AC, there would be something else they put in. Or the car would be simpler.
I definitely want an AC in every car I buy. Maybe a radio. So I don’t mind they they’re bundled.
I’m sure the people who want wikis and chat in every repo like it too. Good for them, use it. I wish them the best. I don’t like it and think it detracts from their product.
I consider them more like project management or collab software with a custom vcm built in. And so I’d rather prefer some collab stack that uses the best vcm stack I think is out there, git.
So they’re on a tough spot because they’d be better off just selling all their GitHub-competitor features without trying to convince people to switch off of git.
Just like a car coffee maker company would be better off just making coffee makers they fit into popular cars rather than trying to make a literal car.
> And fossil developers work on that instead of the think I want.
While also inviting you to join the development team and scratch your own personal itches.
The overwhelming majority of fossil features were implemented by someone scratching a personal itch, not someone scratching random internet-goers' itches.
> If your car followed the unix philosophy you’d have to bring your own radio
I've seen plenty of cars in which you have to bring your own radio. AFAIK, there's even a standard connector in the back of the hole where the radio fits (and the size and shape of the hole seems to always be the same, so it's probably standardized too). I've even seen things like DVD players which fit into that same hole (using an articulated screen which retracts into the device's body).
Part of fossils's reasoning is that they *DO* understand git's logic but they disagree with it.
See most of the points labelled with links to 2.5
plus the comment about showing what you actually did I no rewriting and lying about history, tyhe errors you made are often just as important as the correct way of doing things.
I don’t think that’s true. I think the hive mind is that it’s good, or many “haha sucks but I use it all the time.”
I work sort of near “data science” and there’s lots of no code/low code people wanting to do data science (or at least bill for it). And I’d say the hive mind in non-coders is that fit is bad. But I think that’s more related to all coding is bad and git is like step 1 to coding so it’s the first hard step they hit.
The whole point that git rejects large blobs is primarily because they don't belong in VCS. But for those who need large blobs there is git-lfs as the author mentioned. I don't see a problem with that approach because I personally don't like my git repos growing large after just a few commits which then takes up time for huge clones by other devs. This is the whole principle behind monorepos. If going the monorepo route it's in a teams or projects best interest by keeping the repo size small so new clone by newly onboarded devs or during a CI pipeline don't take forever.
Fossil is an all in one VCS with wiki, issues etc which I don't appreciate because for one it's not feature rich and for another it bloats the backups and restores. So I prefer gits Unix philosophy of doing one thing but doing it really well.
There are some philosophical amd usability differences between fossil and git too but in the grand scheme of things it doesn't matter when one has been using git for a long time. Fossil doesn't have an ecosystem either and making it work with CI CD is a pain because CD tools like agrocd or flux or CI tools like gitlab/gitbub/circle/travis CI systems don't work with fossil out of the box.
> The whole point that git rejects large blobs is primarily because they don't belong in VCS.
Who are you to say that my blobs don't belong in version control? Where does a versioned asset file for a website or a game go, if not in version control? If the answer is "somewhere else referneced by the git commit", then you're accepting that the data belongs in version control but that git can't handle it.
> But for those who need large blobs there is git-lfs as the author mentioned.
git-lfs isn't git, though. It's a bodge on top of git that breaks many of the assumptions about git, requires special handling and setup. If it _were_ a core part of git I would agree, but it's not.
> So I prefer gits Unix philosophy of doing one thing but doing it really well
Git is tightly coupled to a _bunch_ of unix tools, and doesn't work without them. Try running git on windows and see that installs an entire suite of posix tools (msys) just to let you run `git clone`.
People only think large blobs don't belong in VCS because they don't work well with Git.
As soon as a VCS comes along that actually handles that properly people will say "of course, it was obvious that it should have been like this all along!".
Git LFS is a proof of concept, not a real solution.
Unfortunately none of the new Git alternatives I've seen (Jujitsu, Pijul etc) are tackling the real pain points of Git:
* Submodule support is incomplete, buggy and unintuitive
* No way to store large files that actually integrates properly with Git.
* Poor support for very large monorepos where you only want to clone part of it.
In a way, Git is bad at everything that centralised VCS systems are good at, which isn't surprising given that it's decentralised. The problem is that most people actually use it as a centralised VCS and want those features.
Other than that my bigger gripe is when I read something like this:
> Git strives to record what the development of a project should have looked like had there been no mistakes
Git does not strive to do this. It allows it, to some degree. This is not the same thing at all and is basically FUD. I would say the debate is ongoing as to the value of history rewriting. It's probably a tradeoff that some orgs are willing to leverage and Fossil is masking that they allow less flexibility in workflows as an obvious advantage, feels slimy.