Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Johnny Decimal: A System to Organize Projects (2015) (johnnydecimal.com)
124 points by trauco on Sept 14, 2023 | hide | past | favorite | 118 comments


I am convinced that absolutely no hierarchical taxonomy actually works in practice because every categorisation is really slightly fuzzy. Even libraries have problems categorising books into genres because some of them overlap considerably. I spent the last 25 years trying to build up useless hierarchies everywhere and none of them ever worked properly.

Tagging/labelling with attributes is the only viable solution that I've found.

This book is fiction/science-fiction/fantasy.

This document is technical-reference/specification/ratified

I believe this was the notion of the now defunct WinFS built on top of a database engine.


If anything, the history of libraries show you can get really really far by applying a sytematic organizing principle, even if it's WRONG(TM). Librarians may fret that some books are hard to categorize, but library patrons successfully find stacks of books on their own every day all around the world. It's not perfect, but it's better than nothing!


Book classifications ultimately need to assign a book to a location.

So long as that goal is achieved, the rationality or consistency of that assignment is a strongly-secondary consideration. If the book has a place, and can be located, where it happens to be relative to other related works, or unrelated ones, really doesn't matter.

Searching for works based on similarities or relationships is a problem of the indexing system. I should be able to find works on a similar topic, which reference or are referenced by another, by author, by institution, perhaps by publisher, by publication date, etc. That information would be carried within a catalogue, but need not determine the location of the work within a physical archive.


And what led me to build [TMSU](https://tmsu.org/).


TMSU is the only such system I found useful without being cumbersome. After years of trying to use Git Annex, it was refreshing that TMSU doesn't alter the files in any way, merely storing all the (meta)data out-of-band in a separate DB.

These days I use TMSU via my own Emacs-based UI almost every single day, so thank you for that!


Storing metadata out-of-band strikes me as key to any usable content management system within a realistically complex space.

Naming schemes and directory hierarchies have some limited application, but ultimately there will be data which simply won't be shoehorned into any such system, and an externally-managed catalogue tying together disparate elements is required.

(Keeping that catalogue up to date and consistent is a whole 'nother issue.)

I do like the idea of a virtual filesystem in which elements are effectively search dimensions, which leads to an interesting notion that search is identity.

That is, a search will produce one of three possible result sets:

- Null, that is, no matches.

- Plural, that is, a list of matches.

- Unity, that is, one matching item.

In the last case, the search providing a single result is an identity of that result. (It may not be a stable identity over time, but it is at least for the present.)

Where a list is returned, the size of the list determines how usable it is, and how it is usable. Ten items can be quickly scanned to find the relevant item(s), if they exist. 100 or 1,000 items can often still be managed manually, though they'll typically take some time. Somewhere between 100 and a few thousand items, though, you're in the range where automated assessments or filtering becomes necessary.

Large libraries themselves typically have tens of thousands to millions of items. The largest book collections (Library of Congress, British Library) have roughly 150 million books (or equivalents). Other records may exist in greater numbers: periodicals, financial records, databases. Facebook has reported ~5 billion items posted daily for some years now. (I suspect most of those are trivial, but that still leaves a large number of potentially non-trivial items.) Surveillance and other large-scale data collection systems may be larger still.


> Storing metadata out-of-band strikes me as key to any usable content management system within a realistically complex space.

Yes and no. I was specifically comparing it to Git Annex which is hard to categorize in these terms. It forces every file to become a symbolic link to the actual file living in `.git/annex/` and then every query temporarily mutating the hierarchy of directories storing these symbolic links. I found the latter disruptive enough (in particular for the directory mtimes) that I was actively avoiding doing any such queries. See: https://git-annex.branchable.com/tips/metadata_driven_views/

On the other hand my current setup involves TMSU queries which result in virtual Emacs directory editor (dired) views that don't affect anything else. I don't even use the FUSE functionality of TMSU.


The situation is one of compromise.

The core problem is that there are formats and storage modes which don't readily allow for the modification of the item itself. Editing PDFs is already a pain, applying metadata to some entry in a database or wiki, or third-party website, isn't possible at all.

The remaining options seem to me analogues of practices with books.

It's possible to "rebind" an item and include biblographic information into the equivalent of fly-leaves of that work, much as a library may rebind a book and apply a label with inventory number(s), call number(s), and/or bibliographic data to that book. Since physical objects are inherently modifiable and enclosable, this makes sense. The digital analogues vary, but are at least theoretically available (e.g., enclosing a work in an archive format which includes metadata. See the WARC (Web ARChive) file format for example: <https://en.wikipedia.org/wiki/WARC_(file_format)>. Epub files and software packaging formats such as RPM and APT are other examples of standardised file structures which encapsulate others.

The other practice of a library is to abstract out the metadata to a catalogue, effectively a metadata index.

In a physical library archive, this involves a cataloguing process, as part of item acquisition workflows. The metadata for many traditionally-published works is already centralised such that it can be obtained from specific organisations such as the US Library of Congress, the British Library, the OCLC (originally the Ohio College Library Center, which has both its own item identifier and manages the Dewey Decimal Classification), the International ISBN Agency or one of its national affiliates (e.g., Bowker, in the US), the International DOI Foundation (for DOI assignments: digital object identifiers, used extensively in academic journals). Circulation of items is managed through a circulation desk, for both external lending and managing, tracking, and reshelving books used within the library itself, but not being borrowed externally.

For digital media, the equivalents would be either some sort of management system, which would require an application-specific interface, or a filesystem which incorporates not only metadata but workflow management. I'm leaning toward the latter concept as more universal, though that also raises the question of how to deal with workflows in which contents leave or enter that filesystem context itself.

But with a filesystem, you have a number of additional possibilities available:

- The pairing of works with metadata is automated.

- Workflows can be integrated into filesystem actions. The act of creating a file would also create the file metadata (to a greater extent than present inode entries do).

- Additionally, introducing the notion of process status means that the filesystem itself could distinguish between works which have no or only default cataloguing data applied, those which have had additional metadata added (say, from an external look-up or automated heuristics based on file contents).

- Renames and deletions are now managed through the filesystem itself rather than third-party tools.

- I'd like to see both versioning (changes to a given file) and relationships (source, derived, referenced, and referencing works) tracked as well.

- Different forms of a work could be tracked together. The markup-language source and generated outputs (PDF, PS, ePub, HTML, plain text, etc.) versions of a text. Translations. Audio formats. Different performances of a work. Optical scans of printed material. Photographs of visual or plastic arts (sculpture), or architecture. See FRBR (Functional Requirements for Biblographic Records), and the Work, Expression, Manifestation, Item distinctions: <https://en.wikipedia.org/wiki/Functional_Requirements_for_Bi...>).

- Ideally, some sort of highly-invariant fingerprinting such that different versions of a work can be identified and matched despite different formats or slight modifications (intentional differences between editions, translations, errors or damage introduced over time). Traditional whole work hashes fail to offer this, though segmented and normalised hashes or vectors might be able to do so.

Again, the largest problem with a filesystem approach is in imports and exports from that filesystem-based archive. That would probably best be achieved through wrapper formats and applications or servers for other organisations or the general public.


Thank you for sharing!

Exactly what I needed for my own note storing system where I can tag and describe various pdf's like books, lecture notes, and papers with tags, then use `rga` for searching inside of the pdf's for content!


Thank you!


That's what I took away from reading Everything is Miscellaneous a long while back https://en.wikipedia.org/wiki/Everything_Is_Miscellaneous

Definitely thinking about categorization differently (folders which are similar to this, tags, organic order by linking; bookmarks, notes, all are in the same boat as well).


I saw David Weinberger's talk at Google [1] some 15 years ago, where he presented the ideas from his book (and bought the book afterwards). This is one of those videos I rewatch regularly, because the ideas hold up so well, even today.

His critique of the Dewey Decimal system was the first thing that came to mind when I read Johnny Decimal. And indeed, it is flawed the same way.

[1] https://www.youtube.com/watch?v=x3wOhXsjPYM


> Tagging/labelling with attributes is the only viable solution that I've found.

What's your solution for a new tag being introduced late, which could/should be retroactively applied to already tagged media?

Re-tag the entire inventory by hand? Re-tag automatically with some ML/AI tool after you've tagged enough media to train a discriminator? Or just live with old inventory not having that tag?


It can mean a lot of effort, yes, but in a lot of cases you can drastically reduce the number of items to re-tag by defining some simple relations between tags. For example tag A may require tag B to exist but not C, etc.

In the end maintaining order always requires effort, if you have a hierarchical folder structure and you decide to add another category you have the same problem. And realistically folders are "good enough" for most situations, you can choose for each file how fine-grained the categorization should be and the mental overhead for finding a file is often not that bad.


In the case of TMSU, you have a shell / command-line tool, so that anything which generates a listing of files can be fed in to assign additional tags.

That might be based on find, grep, other pattern-matching tools, tmsu itself, or manually curating a file or files of content you want to modify.

One of the reasons I strongly prefer CLI tools in general is that they make precisely this sort of bulk action so viable. I've modified anywhere from tens to hundreds of thousands of items with a small set of commands, and in most cases many millions or more should be reasonably doable.


Depends very much on the problem domain. It's really easy to do this on macOS.

So for example I have a top level `Documents/docs` directory. In this everything is recorded in format `yyyy-mm-dd Title.ext` format. I have a saved search which says "give me everything untagged". I will select each thing that is untagged and then tag it!


I think they're asking something different, what if you think of a brand new tag after you have already tagged things for years? It is potentially a lot of effort to retroactively apply it.


I ran into a similar situation a while back now,

I have my version of a zettelkasten system, where I put a tag/ list of tags at the top of the note file (timestamp being the note name). Combining this with something like ripgrep allows me to search the note file contents by those tags, and keywords.

A tag I used, didn't seem like the right one for the content as it matured and grew over time.

This meant creating a tiny script which would take the name of the tag and the note name, in which it is being used and look at backlinks and forward links and rename the tag based on usage.


Yeah. All systems like this have to account for ad-hoc additions. Otherwise you will often jam a few squares into some circles just to accommodate your categorization system.


AI/LLMs can probably make it easy.


Yes, usually, you must recheck your content regularly and check if old content validates for new tags, or if old tags have shifted in meaning, splitted into new subtags, etc. But this does not necessarily mean you should check everything, and regularly must not mean every week or month. Just review small parts of your library for changes, occasionally as you have time, the improvements will come over time by themselves.

A never changing library, is a dead library.


With hierarchical tags this is less of an issue. Presumably the old tags are still valid for the extant documents? Can the new tag be asub-tag of one of the extant tags?


More likely to discover a supertag than a subtag in my experience. I think this is why so many people get value out of zettelkasten: focuses on connections rather than categories.


Ideally both the supertag and subtag would return the document tagged with the subtag.


> I am convinced that absolutely no hierarchical taxonomy actually works in practice because every categorisation is really slightly fuzzy.

Fuzzy is ok, not everything needs to be perfect.

> Tagging/labelling with attributes is the only viable solution that I've found.

Categories and tags have different use cases. Categories are the best solution, when there is only one possible location for an object, like with physical objects, which cannot appear at multiple locations at the same time. Tags make sense, if you can offer multiple entry points to reach the goal, and have the option to manage them all on a reliable level.

And nothing speaks against using them both, with different valuesets. Like with digital files, you can have categories for managing the filetypes (books, videos, etc.), but then use tags to manage the content in each category.

> This book is fiction/science-fiction/fantasy.

Isn't this kinda redundant? science-fiction is already fiction, it's in the name. Reads more like a hierarchy. Which you of course you can also with tags. Any fantasy-tag automatically inherits the fiction-tag too.


> Isn't this kinda redundant? science-fiction is already fiction, it's in the name.

This is nit-picking the specific example rather than engaging with the GP's point. There are definitely genres that span the fiction/non-fiction divide (like romance, horror, politics, judaica, and many more). Interestingly, hierarchical systems used by libraries tend to ignore genre. The Dewey Decimal system doesn't classify literature by genre but instead by language, though some overarching genres (drama vs poetry) are treated as subcategories of language (e.g., German poetry is 831, French poetry is 841), but only for major European languages.


> This is nit-picking the specific example rather than engaging with the GP's point.

It's showing that tags can be fuzzy too. So the initial complaint is not avoided, just moved.


> Tagging/labelling with attributes is the only viable solution that I've found.

The only solution I've found is being utterly ruthless about deleting things. YAGNI, YAGNI.

Alongside this, topic-based folders are fine. Not many things are in both "econ" and "sf&f", and when something is relevant to both, such as Krugman's paper on interstellar trade, it's easy to decide where it goes (econ) and where an empty file called "Krugman, Interstellar Trade: see econ" goes.

Empty file with a textual reminder, not a softlink or hardlink. Those are brittle. So are attributes.

Use broad categories, though. "sf&f", not "SF&F - Fantasy - Urban Noir - West Coast - Mermaids". The definition of "broad" depends on your areas of expertise.


I'm just wondering when we're gonna get tagged filesystems. Maybe something partially hierarchical. Would love a book/ directory that i can query for 'sci-fi' or 'specification'. I guess one could implement it using sym-links :^)


Don't even go there with the symlinks. My mother made a family tree on windows with folders and shortcuts. It broke OneDrive and robocopy completely. I don't even know how to help her fix it.


On one hand this is an edge case, on the other things like this make me wonder how computers can even work if first-party software built specifically for nothing but managing files fails to gracefully handle a folder structure a user created by hand.


That was exactly my take home from this. Windows has some very serious flaws in it when it comes to data integrity. I do not trust it and do not use it myself.


It's 3 in the morning and I couldn't stop laughing out loud when I read this. Poor OneDrive lol.

I hope she managed to keep all her files and didn't lose anything.


It still works and she hasn't lost anything but you can't actually move or delete any files any more because windows explorer craps out too!

I literally have no idea what to do with it :(


My assumption is she ran into the max path length limitation, here's how to fix it https://learn.microsoft.com/en-us/windows/win32/fileio/maxim...


That was one of the problems. Unfortunately after fixing that it's still broken.


You could write a short PowerShell script that recursively deletes shortcuts and symlinks, but I'd certainly take a full backup and have it offline before I kicked out off.



The author of TMSU left a sibling comment to yours: https://news.ycombinator.com/item?id=37507343

https://tmsu.org/

> TMSU is a tool for tagging your files. It provides a simple command-line tool for applying tags and a virtual filesystem so that you can get a tag-based view of your files from within any other program.

> TMSU does not alter your files in any way: they remain unchanged on disk, or on the network, wherever you put them. TMSU maintains its own database and you simply gain an additional view, which you can mount, based upon the tags you set up. The only commitment required is your time and there's absolutely no lock-in.


Got me wondering how it tracks file moves. From the FAQ:

> TMSU [does] not automatically detect file moves and renames

I dunno if we can call it a tagged filesystem with this limitation. (Why not store tags, or at least a GUID, in file metadata?)


This seems a serious limitation. Bash is my file manager - ls and cd. I would not give that up for any tool no matter how convenient.

But presumably there is some API or ABI to monitor, or some signal emitted, when files are moved, added, updated, or deleted. However iwaitnotify works, could that be incorporated (even theoretically) into TMSU?


https://github.com/oniony/TMSU/wiki/FAQ#why-does-tmsu-not-au...

There are a couple very barebones wrappers around mv and rm, though they could be better (pass through arguments, etc.).

https://github.com/oniony/TMSU/wiki/Tricks-and-Tips#filesyst...


It does store file hashes. It has a repair command that will use file hashes to find moved files. I don't know how well it works.


There are some attempts using Fuse on Linux.

At least one I tried ages ago was one you could use to browse your music collection, it automatically generated "directories" based on mp3tags.


Apple tried doing this with "tag" in the file system but most people don't use it.


Microsoft allegedly wanted to get rid of the folder hiearchy completely. There were a lot of rumors surrounding "Windows 97" (what later became Windows 98) because of its Internet integration (which eventually just turned into a lot of dead-end MSN widgets and Internet Explorer being shipped with the OS). I don't know how well-founded this speculation was but allegedly Microsoft wanted to fully commit to things like "cloud storage" (before this was a thing, so of course not using this name) and ended up shelving most of this work because the networks just weren't fast enough at the time.

Note: I specifically remember these speculations in the lead-up to the release of Windows 98. They don't seem to be easy to google at this point.


I use it! :)


Don’t Do That. Don’t Give Me Hope.


Same.


The approach I've been favouring for a while has been to have a data store, I call it the "stacks", based off physical library nomenclature, and then a virtual filesystem which effectively uses searches along various attributes to surface individual items.

Those searches might be based on title, author/creator/contributor, publication/creation dates, assigned identifiers, full-text search, relations between documents, subjects or topics, document type / format, or other aspects.

How that is specified precisely ... I haven't settled on entirely, though short mnemonics, two- or three-letter where possible (ti -> title, au -> author, date, isbn, oclc, doi, etc.) might work.

If the filesystem lives under /docfs, then, say, /docfs/au:barrie/ti:peter+pan would turn up J.M. Barrie's Peter Pan. Specific document identifiers such as ISBNs, OCLCs, DOIs, or LCCSs could pinpoint specific documents, looser searches such as for Peter Pan might generate a list rather than a unitary response but those individual documents would be further distinguished by other properties.

The filesystem would be virtual rather than static-on-disk. You're effectively exploring the namespace. Creating this as a filesystem rather than, say, as a database query or application-specific data store means that standard filesystem tools would be available to interact with the contents, though modifications of works might require some additional work.

The system could also provide access to works not directly in the stacks, including remote resources (e.g., accessed via URIs and network calls), or applications (through some API). The underlying data store could be in any of several formats, and again the filesystem-based access could abstract between several independent stores.

Now all I need to do is build it ...


This sounds at least somewhat like TMSU, which has been mentioned in some other comments here.

https://tmsu.org/


I've seen that, yes.

There are some similarities, and I'm thinking that TMSU might serve as a component of this system.


I just wish xattrs were more resilient and explicit.


This is precisely the reason why smaller cross-linked notes and tags work quite well for me, and JD or almost anything else in the past hasn't.

The ideal for me involves hypertext + (local, private) AI-assisted search.

None of this will replace reviewing/pruning your notes regularly (think: Andy Matuschak's fleeting and evergreen notes), but it minimises the friction.


tagging is what happens when you allow an artifact to exist under more than one hierarchy at the same time.

The progression usually goes.

"we want to label our things."

"There is a natural hierarchy that we want to be reflected in our labels.

"Some things are not labeling well. instead use multiple labels per thing."

"There is a natural hierarchy that we want to be reflected in our labels."

The unix files system is a good example of the latter. it favors the hierarchy but allows each file to have many names if that is what is required. The main weakness of the unix filesystem is that there is no reverse map, that is, finding a file from its name(s) is simple. finding the names from the file involve searching the whole filesystem.


Also if someone by some miracle is able to come up with a univocal hierarchy for a usecase or organization - that may even work throughout time and evolving situation and conditions - then collaborating with an other field or organization will have collisions or discrepancies in taxonomy and a hopeless confusion of category number mapping.


Scientific libraries categorise into *several categories*.

They effectively use hierarchical tags, with the set of tags very carefully curated.


I actually use a hierarchy of tags to organize my notes. The _tags_ are hierarchial, not the storage path.


but your storage path is already a hierarchy of tags.*

*If using a unix derived filesystem.


I could use symlinks to refer to files, yes, but rather I'm using nonstandard #foo-bar-baz subtag syntax to refer to sections of org mode documents. I do not use real org mode tags because those are restricted to headlines.

In other news, if anyone knows how to restrict a search in an org mode buffer to search only the headings, that would be very helpful! Right now I'm using an external grep wrapper, restricting to lines that begin with an asterisk.


ehh... symlinks... perhaps, But I am talking about hardlinks.

Having said that, I will be the first to admit. the reason hardlinks are not often used is that it is a real "here be dragons" part of the map.

10 years ago I spent a couple fun weeks categorizing all my music. my schema was something along the lines of

every file was linked to

  artist/artist_name/song_name
  band/band_name/song_name
  song/song_name/band_name
  date/YYYY-MM-DD/song_name
  genre/genre_name/song_name
  playlist/playlist_name/index
  etc, etc
then I had fun making up a small set of cli utilities to find tags, create and manage playlists etc. then A. it turns out I don't really listen to music. and B. the whole structure wanted to explode every time I would try and move it to a new host. so it probably still sits there on one of my file servers but now as a million duplicated files.


I've thought a ton about what you just described, and I think of it this way: not every directed acyclic graph happens to be a tree.


this, tags + dynamic views on those tags


A past coworker implemented a system like this. It was awful. He was the gatekeeper because the numbers had to be "just so" to meet his approval, and he was the most senior person on the team.

Limiting yourself to a handful of categories at each level makes sense, but numbering everything is a bad idea. It's busywork, and other people have to learn your idiosyncratic nomenclature. Just give the directories good names. Search really isn't as bad as the article suggests, especially with something like broot [1].

[1]: https://github.com/Canop/broot


To me these intricate classification system fall in the same category as people developing very advanced note taking setups, dotfile management tools or fully customisable, reproducible shell/IDE/editor setups. Very nice ideas on paper, but you can get most of the way with far less effort if you mostly stick to the defaults. Trying to perfect it to death ends up looking like busywork and frequently causes conflicts with other team members less obsessed with these secondary issues


> Search really isn't as bad as the article suggests, especially with something like broot

Johnny here. The problem is "especially with something like x" because x never exists in any enterprise. Go ask your BigCorp SharePoint sysadmin for broot. The answer is no.


Your mileage may vary, but I've certainly been able to bring my own tooling at my employers, broot included.

Regardless, the main point is that numbering directories as a mnemonic is more of an encumbrance than a help. The justification on the website, that it prevents new directories from disrupting the user's mental map, is unconvincing. There are many methods for that that don't require a team to align on a numbering system and fixed directory order - shortcuts, bookmarks, snippet managers, broot, and z, to name a few. Not all methods are available in all contexts, but at least some are.

Prepending numbers, on the other hand, forces users to, at best, waste time learning a system they might not need to use again or, at worst, bikeshed about what number to give a new directory or how to renumber if a change is needed. Users can't even skim directory names because the order is fixed and they've been robbed of the ability to sort alphabetically. The larger the team, the worse all this becomes.


But don't people at your place just put numbers in front of things anyway? I've seen this literally everywhere I've worked. Not universally, but common enough that it no longer strikes me as unusual.

They start at `01` and increment. I still remember back in 2007 at the National Australia Bank, the WinXP deployment project stored its DNS designs in folder `53`.

My theory is that people do this because the alphabet is unreliable. What was the name of that DNS designs folder? I don't remember. But it could have been any one of a bunch of things: DNS, Designs (DNS), Dynamic Name System, so on. That choice is essentially arbitrary because we have this language full of synonyms. And so how do you remember it and the dozens of other folders that you use regularly? You don't!

Numbers lock things in place. `53` will always be there. The person who created that doesn't give a hoot about `52` or `54`. They just know that their stuff is at `53`.

Adding a whole bunch of new folders that start with 'B' would have messed up the spatial placement of our `53` folder, but now that's not an issue. `53` is always exactly where it was, and muscle memory can develop.

I dunno. I haven't done the studies. I'd love to! But this is a pattern that I see over and over. To dismiss it as 'wrong' ignores the fact that people are doing this for a reason. There's something natural about it.


I didn’t like the numbers to begin with, but there are a few advantages:

- After a while you associate the number with the subject so it becomes faster to visually scan a list.

- You can type the number faster. They are shorter to reference.


I'm updating the workbook this morning and I just typed out this little anecdote. The topic is 'whether to put the JD number in the filename or not'.

---

Last year the boss mailed me a five-year-old file. "Is this the latest version of this?", he asked.

The file name started with the number. I even remember it, it was `31.17`. This allowed me to go directly to the now-archived project folder, immediately navigate to this file's home, and see no, the file he mailed me is the latest.

This took me literally one minute.

I could probably have found my way there just by the filename -- the folder structure was well organised enough to allow this -- but it would certainly have taken longer and I would not have been as certain that I was in the right place.


Previously: https://news.ycombinator.com/item?id=25398027 (2020-12-13, where the site author complains about HN ;-), https://news.ycombinator.com/item?id=36308366 (2023-06-13)

I do believe that such a system has a value, but only when built by users themselves. Sure, there would be some commonalities and schemes like Johnny Decimal may be a good guideline; prefixes would work well because they get sorted without any additional jobs (but doesn't scale for many subjects), tags are fine when you have a software support, you can also use metadata like dates and backlinks, and so on. But making a proper organization takes time and observation, and only long-time users with an urgent need have that knowledge. Especially because no organization system can remain unchagned over time.


This feels good, but in practice, it's friction.

Instead, take advantage that ideas and work evolve through time, and store your work and notes chronologically. Combine that with good search.

This allows you to find items by time proximity and keyword / semantic / text embedding search.

The only folder categorization needed are slashes around ISO 8601 date parts (whichever groups a reasonable amount of content together naturally):

- YYYY/MM or YYYY/WW

With files named:

- YYYY-MM-DD - Topic headline keywords.ext

After a while, just stop filing anything, have a script sweep your desktop and downloads paths into this archive path if a piece of content hasn't been opened in x days.

(If you really really want a set of things together, make a folder for them on your desktop, put the bits in it, and ensure the script only sweeps the whole folder once no bits are being used.)

If you're still working on a set of stuff, it'll stay on your desktop. When you're done, it'll go where you can find it.

If you need something, search. If you can't search, remember when you worked on it, and browse. You'll find it, and all the things around it, that were in your head at the time, and can "reload" the whole context.


I have started to use this system shortly after having first heard of it. I think the greatest challenge is that as I constantly create new things while only completing a fraction of them, I'd end up with a very “sparse” set of numbers being “active” in the end and unlike what is claimed by the system the relevance of directories for me is rarely linked to the time of their creation. Hence I needed to tweak this system a little to be useful to me and while it is a minor improvement over before I am still dissatisfied with the outcome...

An alternative organization system which is more lightweight and could also work well -- I am not affiliated, but would like to try to add some of these ideas to my own structuring approach: https://plaintext-productivity.net/3-02-file-folder-structur...

During the course of time the License of the JD system itself seems to have changed to a CC “NonCommercial” license. This really contradicts that this system is intended to be also useful in business contexts. I shall only look at my old copy where it is still under a DFGS-compliant license, but I wonder whether it is even possible to claim ownership on an organization system by means of a license? Wouldn't the only way to protect against it being used or described anywhere be a patent?


I think this doesn't solve the real problem: every person will categorize a document differently and look in a different place. That's the hard part. Creating categories and uniquely identifying them is easy.


We use a similar system at work and yes, it is a problem.

For example we have categories like "customer input", "contractual", "work documents" and "delivery". Some technical specifications could go in every place at once. It is an input from the customer, and it has contractual elements, but sometime, we have to update it, and make it part of the final delivery. These often end up all over the place.


I don't think they're trying to solve this issue directly. This is intended to be used as PKM (personal knowledge management) system.

You are correct in any case that this is a hard problem and I would love to get some perspective from the author on implementing such a system org-wide.


Not sure I agree with your PKM point. It feels like it fits well for company/project structure. Reading his forum and a bit into the article it seems to advocate a "librarian" that keeps the system correctly categorized. I think he mentioned somewhere that this system doesn't fully fit PKM. I remeber some example of categorizing photos according after vacations, but running out of numbers. But I must admit I don't fully remeber the arguments.


Oh god every time someone start numbering sub-directories I want to scream. This feels soooooo like the old days. I hate it because then anything like ordering in alphabetical orders doesn't work. I'm looking for the documentation subdirectory, I know it's starts with "D" it's easy if I can order them in alphabetical order. But nooooo. Don't force me to remember decimal codes unless your name is Dewey and it's going to be the same thing every where, all the time.

Also it's the same people who'll create a slew of empty sub directories so it's difficult to get a feel of what is the current state.TO be fair I think part of the problem is they try to apply that to everywhere at any level without logic. Number the directories for all customers... so now we assign the next number to the next customer we get, so to find a customer you need to known in which order they were added and it just makes no sense.. but people will usually support it because it "looks more organized".


I’d be so funny (at least for an outsider) to see an office where there’s a lot of corpo speak mixed with this.

- we need to raise the bar for 34.18. where are the deliverables for 56.17?

- the 65.18 objectives changed, so they will be 14.27 on 11.13

- wait, 11.13, mmeaning the docs or a date?

- I’ll check and ping you

Is this an early sign that AI is taking over?


Sounds more like going back to the analog version of Zettelkasten, i.e. an actual slip-box.

I'm getting chills just thinking how much overthinking/over-analysing following this approach would cost me. Because I'd over-engineer the hell out of this (just forget to take the actual notes). But then I remind myself that our brains can be so beautifully different and some people would find this system more productive.

I'm quite happy with my poorly but regularly managed evergreen notes.


It's fun seeing information science applied to different domains. This is very similar to the system that we've converged on over the last 100 years for legal codes with title.chapter.section (see, e.g., https://law.cityofsanmateo.org/us/ca/cities/san-mateo/code/8...). The ability with the system to unambiguously and _succinctly_ cite to a particular part of the law (or, in this case, the filesystem) is a game changer.


I use Johnny decimal because it allowed me to offload my decision for file organization. I had my doubt, but when I tried it as a test, I felt a sense of relief. Before, my file system was a mess, and no file search or tagging could save me from naming hell. Naming things are hard and took a lot of mental energy. I see a lot of cynicism in the comments but having an imperfect and flawed system is better than having no system at all.


IMO this kind of misses the point because now you have a fancy system to handle your documents, but the rest of the company is using a million variations of just about every system you can think of including having folders named "temp" and "data2" and Karen will not use your system no matter what you will say. I mention this, because the author mentioned "corporate system" and not "personal system". For personal management this is fine of course.

Given the "systems" I have seen in the wild I think literally anything that sort of standardizes information generation and retrieval is a massive improvement over random word documents on various NAS'es, local files, powerpoints, emails and of course various sharepoint and custom CMS pages.

I think many businesses can noticebly improve by dumping everything they have into .txt files instead of all the bs formats MS and friends came up with and just have a good search function. It looks like ass, but do you care about information or what.

I'm sort of joking, of course, but not really..


I work in a giant place, 100k+ employees, 150+ years of documents, layers upon layers of processes, presence in all countries, heavy regulation, successive mergers.

There is simply no way to make it work perfectly. We can do migration upon migration but that has never seemed to simplify anything (the joke in my team is as soon as something starts working, we migrate half of it away to a non working solution and end up with two half systems - this has proven true so many times we dont even take it as a joke anymore).

We embrace the chaos, and focus on concrete client problems, and collect the fees for service well provided. No point in trying to make it make sense anymore or to micro optimize productivity. We have a rough estimate that it takes a year for an experienced new joiner to be truly productive, and we simply pay well so they don't leave before doing their first big contribution and basta.

I work concurrently on two identical systems doing the same things on paper but with important difference in details, because the current political war made the american system win over the asian system, while knowing the asian system simply handles better our 13 market quirks than the american one that was made for a simple single market. We slowly complexify it to reach feature parity, deploy it in markets with low stakes while we improve the asian system continuously for the hundreds of high-paying clients. One day either we'll reach a point where a market that matters can finally be migrated, or the fact we make much profit in Asia will catch up with the global strategists and the asian system will start being proposed to solve (badly) the american problems.

It has never been thought by our strategists to keep two systems working well in their respective core markets at the global scale. So we migrate half of Asia and keep two systems working badly at the local scale.

I pity the intern joining in 1000 years though. We have no idea how the kraken will evolve after centuries...


I really like the typography of this website.

Berkeley Mono for the sidebar, and IBM Plex Serif (IMO, a beautiful serif typeface for the web), very nice.


Some of the elements (general layout, headings, anchor styling and on-hover highlighting) remind me strongly of my codepen: <https://codepen.io/dredmorbius/full/KpMqqB>


For anyone interested in how the obsession for organization can end: https://en.wikipedia.org/wiki/Paul_Otlet



I have given up. I now mostly have generic buckets but get super specific file names. E.g one I just saw is “Drawing of rising main chairing beam with solid chairs.pdf” and then use void tools everything to search for it


This gets posted here a lot [0], and I hate everything about it.

[0] https://hn.algolia.com/?q=johnnydecimal.com


I also hate it. My hatred stems from the fact that it's very much a tech bro ideology. They get it into their head that THEIR system is 'the one'. It's the same way that tech bro weirdos running startups get it in THEIR head that THEIR product is some sort of life changing piece of technology that every human on the planet is going to ultimately adopt once they just "understand it". And god forbid if the product does catch on, because then it's a horrific confirmation bias to them. So instead of luck and market coercion, the product's success is evidence of their brilliance.

I know that seems like a weird conclusion to get to from this, but having read a LOT of these types of articles over the years and seeing people's convictions and deep seated beliefs about their categorization systems, it really underscores a lot of how humans get it in their heads about their POV of the world is the 'right one' that the rest of the world should get on board with.


Johnny here.

This isn't how I feel. I wonder which part of the website gives you the idea that I believe that it is 'the one' and only system?

I mean I could surround every statement on there with "please only do this if you feel that it's for you" but that might get tedious.

I have an idea. I present it. I personally like it. If you don't, that's okay. Go find something else that works for you.

Please don’t call people you know literally nothing about ‘tech bro’. That’s not nice.


I think from my side the problem could be expressed in a more nuanced way: it's usually not the originator of some idea to be the annoying techbro trying to evangelize everyone. In most cases the culprits are early adopters thinking they found the silver bullet solving all of their (and everyone else's!) problems and being very vocal about it


I agree with most of your sentiment. I also came to associate the proponents of these system with a a set of other traits: - obsession with tools and methods over outcomes - busywork - insistence with their approach being the only right one It seems these trait could be quite adjacent with tech bro behaviour


I too hate seeing other people excited about things they are interested in


I don't mind content marketing with real value, but this is obvious "digital products" spam designed to get readers into a sales funnel for the workbook.


Johnny here.

This website has existed in some form or other for over a decade. I've poured thousands of hours of my spare time in to it, and to helping people on the forum or by email. For a decade.

The workbook went up literally six weeks ago because I quit my full time job in May this year, hoping to make some sort of a living from this hobby of mine. Feedback has been universally positive. It helps people.

My site has no ads, no tracking, no garbage of any kind. Hell, I rebuilt it in Astro so that it doesn't even have JS! And so I spent hundreds of hours writing a thing, which I'm selling. A basic transaction, which you are free to ignore.

Shoot me for trying to pay the rent. I hope you never try to sell anything.


Care to elaborate ?


I mean, there's a _lot_ to dislike, but it mostly boils down to this being a physical object system applied to digital objects, and I disagree with every single premise in the list of reasons they give for having created it in the first place. For instance, there's a whole section on how "Search doesn’t help" which ends with "You can search for things, but the results are garbage." - the author hasn't considered that maybe search is garbage for them precisely because they're trying to apply physical object system to digital objects. I don't know about you, but Spotlight on my Mac finds exactly what I'm looking for every single time, because we use simple naming conventions.

I also hate the absolute waste of human cognition this system causes making people try to memorise which number is at the start of a folder name they're looking for. From their own examples, a folder called `22 Contracts` is going to be sorted into a different position in a list than it would if it were just named `Contracts` - without the number, I both know exactly what the name of the folder I want is, and where it will appear in a directory listing. With someone else's choice of number at the front, I know neither of these things.

Basically, every single thing this claims to solve is either not a problem in the first place, is trivially solved without making people learn a stranger's personal numbering preferences, or is made considerably worse by this system.


I really love the crazy-people energy of this.

There's no hesitation or equivocation, just THE ANSWER.

Does it matter if it's a joke / hoax or not? I don't think so, I think it's beautiful either way.


Tagging + searching.

I tried the organisation thing and it always falls down somewhere, when an item is supposed to be in multiple places at once.

Especially now when Obsidian supports Properties on pages as first-class things combined with Omnisearch makes it a lot easier to just search for whatever I'm looking for.

I still have crude categories like "projects", "recipes" and "RPG" with subfolders for each campaign, but that's about it.


IMVHO Johnny Decimal is a good system for paper archives in classic filing cabinets / furniture with many drawers, not much reasonable for computer-based systems.

Beside that IME in PKM/PIM terms we need taxonomies just to manage storage, if it's not managed alone. In that case a time-based taxonomy like an year-based root, a month based II level or so it's enough for storage handling and eventual "time traveling" in noted contents. I have almost anything in org-mode/org-attached binaries, org-mode tangle-d configs and so on, I've tried various taxonomies and in the end just choosing to have subdirs of my org mode root dir one per year just to manage contents. Similarly this is how I split my Beancount-ed personal finance, one note per month.

org-roam-node-find title-based search&narrow access + eventual ripgrep full-text-search are good enough to find anything. Rarely I've used org-ql but it's simply too long to keep consistent notes even with templates to use an SQL-alike querying.


The approaches for information organization versus info retrieval are often different. For example, we mostly read content based on whatever fixed structure it's in, be it a blog post or a scientific article. Note-taking tends to follow that structure. But we retrieve and consume the information in a non-linear, context-driven search.

Putting everything in a rigid hierarchical oder has some benefits, namely familiarityto the author, but incurs the cost of organizing the material and the mental context switch from the task at hand.

I've been researching ways to make information tagging and visual search easy and effective - at the source - in the narrow case of reading scholarly documents [1]. The goal is to avoid a prescribed organization format in favor of contextual tagging, visualization for personal analytics, and linking of concepts afterwards, in a way that reduces distraction from reading and understanding.

[1] https://www.knowledgegarden.io/


I've implemented this system for a while, but ultimately will switch to a less hierarchical one at some point in the future. Search, especially fuzzy search, is actually great! Using fzf in my terminal I can easily cd into any directory I desire as long as it or one of its parents has a sensible name.


For the best part of my life I use a controlled set of tags[1] rather than hierarchical categories. This is mostly due to the fact that stuff can be a lot of things at the same time.

That said, one of the best use cases for Johnny's system I've found is when you have to share an online drive with hundreds of people, where you can't use tags, and even if you could, there would be no consensus. Strangely, nowadays I can find my way around a huge project's online files quite easily just by the prefix numbers of each categorical level.

[1] https://karl-voit.at/2022/01/29/How-to-Use-Tags/


My email folder system was inspired by this and loosely follows it. Works well enough for email (personal and professional). I have not tended to find it as useful for actual file folders.

https://imgur.com/a/K2mN8cY

If nothing else, it's because I'm often working with yet another new set of folders and files, and they are very different from the last, and I haven't developed some sort of intuitive hierarchy that I could apply. With email, it's pretty consistent. The only changes are new project/client folders popping up, and that's typically pretty infrequent, even with agency work.


I think everyone has a different thought process and style of thinking and work and it is painful to force others to use your way of thinking or working or organisation.

I have written about video game interfaces, for work, person stacks, person based APIs. Conway's law is very real.

I like the idea of interlocking lists that act like gears of work between individuals.

I can outline my interface with you and you can react to tasks I give you, based on your interface.

Like a unit test, I can import you and tell you what I need, you ask clarifying questions and I reply to them and we both track the mutual fallout. Synchronization and back and forth with email is painful and antiquated.


Honestly, if you want to do this, just study and use the Library of Congress cataloging system: https://catalog.loc.gov/vwebv/ui/en_US/htdocs/help/numbers.h...

The energy in creating your own catalog system to this extent while one already exists seems pretty ridiculous.

Why not Dewey Decimal? Well... I mean... I guess you could. But most libraries have converted to LOC since it is fully supported and still receives attention and updates to categorizations.


The article you linked to doesn't explain plainly how the cataloging system works. Going to Wikipedia[1] makes it a little more intelligible, but honestly it still sounds pretty complicated.

If a cataloging system is difficult to explain in plain and simple terms, it will be tough to advocate for it with other people. I might put in the effort but how can I quickly convince others to do so?

Johnny Decimal is pretty simple in comparison, it's 4 bullet points:

- A JD number is two numbers with a dot in between: XX.YY

- First digit of first number: a broad category

- Second digit of first number: a sub-category

- The second number is incremented each time you create a new folder (which you want to be careful about!)

Once you've explained that and you show a basic folder structure, people will understand how it's used, at least. That's where it performs better than your system.

It's probably fine if it's tough to define, but it has to be simple to teach.

[1] https://en.wikipedia.org/wiki/Library_of_Congress_Classifica...


I've been trying it since last time it came up here, a few weeks ago.

I have trouble remembering the nn.mm codes, but found it useful for organising backups on a remote machine.

For some files, it's obviously difficult to choose one category but it's not that big of an issue when compared to organising, say, git repos, or folders with a lot of files. Could go higher into the hierarchy but the folder is too specific for that.

If anyone has faced or solved these issues, I'd love to hear about it


Don't remember the AC.ID codes. Have a script generate a textfile index based on the file structure. Then you just look at the index. You look into a place enough, and you'll remember it eventually then.


I tried this for a few months, just switched out of it yesterday. In contrast to what many people are saying here, my problem is it didn't have enough hierarchy. I wanted more levels of folders and that wasn't permitted by the system.

Now I click more to find what I want, but it's cleaner and easier to find.


Not sure why this is tagged (2015), I redeveloped this site earlier this year. The last-updated date is right there in my sidebar. @dang you might want to remove.

No biggie like. Hey all, Johnny here. This pops up here every few months, always happy to answer questions.


I have found Sublime Text and Sublime Merge to server me well with sort of `GTD` like approach (not orthodox one though). Finding in files + `regexp` works perfectly fine for most of my notes related to projects. Sublime Merge serves as a history keeper.


Is this the one that was speculated on as being a hoax / joke?


i've been using an adopted version since about 6month now and it's actually works great. I really like that every file now has a specific place. I even transferred the concept to my offline documents and folders.


LLMs will soon make file organization something we never have to think about.


Organisation, classification, taxonomies, hierarchies, and catalogues are complex concepts.

I've run across the Johnny Decimal concept a few times, and though it's interesting, and probably better than no organisation, I'm not sold on it. That's strongly informed by a long deep dive into other classification and bibliographic schemes.

One key distinction is between a location-based classification system and an indexing or tagging system.

A large bibliographic institution with an extensive history which maintains its own versions of both concepts is the US Library of Congress.

The Library of Congress Classification is based on a scheme originally inherited from Thomas Jefferson for organising his own personal library whose donation to the US Congress inaugurated the Library of Congress. It is based on 21 top-level divisions, based on letters of the Roman alphabet from A-Z, excluding the letters I, O, W, & X). These are further broken into a vast number of further divisions, with the ultimate aim of identifying an individual work AND locating that work within the library's physical archive. Books, as physical objects, occupy a distinct space, and one space only, which cannot be shared with another book.

The Library of Congress (LoC) Subject Headings are a set of descriptions of works. Unlike the classification, a book may be assigned multiple subject headings, corresponding to its major themes. These headings are not used to define a book's location but rather aid in its discovery when searching by subject.

Both of these classifications have evolved over well over a century, are the subject of ongoing efforts and development, and have had to address the fact of change in both the underlying subject matter (e.g., the Austro-Hungarian Empire, Soviet Union, and Colonial Africa are no longer extant) and how subject matters are addressed and referenced (psychology, sexuality, race, economics, scientific discoveries, and technological advances have all made old terms obsolete or deprecated, and introduced new ones).

These are not the only classification and cataloguing schemes used by the US government, let alone other bibliography-manangement institutions elsewhere. The Superintendent of Documentation (SUDOC) system is used by the Government Printing Office to manage its publications and archives, and is organised principally around government offices (Cabinet divisions amongst the Executive, others in the Legislative and Judiciary), as well as date.

Some libraries don't organise works by subject but instead assign a location code to works independent of subject by which the works may be requested by patrons. (These are typically "closed-stack" systems.)

There are other classifications: Dewey Decimal (superficially resembling the LoC classification), Decimal Classification, and Colon Classification are amongst the most widely used.

Digital storage has different requirements than physical. Documents needn't have a single storage location or address. They may be part of numerous distinct systems, often with idiosyncratic or incompatible management or organisation themselves. Documents are far more likely to be live, with ongoing development and adaptation as opposed to the static state of a print archive. "Documents" may in fact be data held in databases, wikis, or other systems, and be associated with multiple projects.

One generally useful set of concepts for organising documents is the Dublin Core Metadata Element Set (DCMES) with a set of 15 defined elements: Contributor, Coverage, Creator, Date, Description, Format, Identifier, Language, Publisher, Relation, Rights, Source, Subject, Title, and Type. A document might have zero or more of each element associated with it (e.g., multiple authors, translators, illustrators, and/or editors as Contributors). And it's been observed that some elements are themselves ambiguous (e.g., what is the difference between a "contributor" and a "creator"). But in general this at least provides a useful framework for considering works.

<https://en.wikipedia.org/wiki/Dublin_Core>

In looking at document management I've found a few other elements are useful to consider:

- Project: the task being worked on.

- Workflows: what is being done with documents.

- Document lifecycle: Often consisting of a set of stages: creation, edits/updates, acquisition, cataloguing, format conversions or transformations, transmission or publishing, and deacquisition or destruction.

- Security and/or distribution scope. Keep in mind that metadata itself has significant value and should be considered withing your data management / privacy / security policies.

- Teams and participants. Both within and external to your organisation.

- Organisational evolution. Personnel, departments, divisions, enterprises, and governments come, go, and change. The terms used now might not apply in a year, or 10, or 100. (Your document lifescycle might not extend to 10 or 100 years. Then again, it might.)

Most usually, I'll find I use the following in organisation:

- Date. Generally the start date of a project. Organising documents by time makes a great deal of sense, and creates natural search spans: day, week, month, quarter, year, decade, etc.

- Author/owner. The person, department, or institution which created or "owns" the document. Within a sufficiently large organisation, these should live within a departmental / divisional structure.

- Title. How the document is generally referred to. This can be startlingly ambiguous, especially internally. Third-party documents tend to have far more fixed and established titles, though even these can be tricky. E.g., "The White Album" is not actually titled "The White Album".

- Other relevant distinguishing information.

Where possible, I'll try to store information under a single filesystem directory hierarchy, though in practice things tend to live in multiple places: a shared code repository (public or private), a wiki or other platform, databases and application data. If possible these are tied together in some useful fashion, though any such organisation tends to be approximate and lagging reality at best.

With a sufficiently large organisation, a librarian role who manages overall storage and resolves disputes is helpful. Though you'll be exceedingly lucky to have someone actually assigned to such a role.

Johnny Decimal is interesting, and may well work for some cases. I doubt it meets all needs, and certainly isn't the One True Path for all individuals and organisations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: