Can someone explain why "Items moved from /12345 to /12/34/12345. HN now starts ...

js2 · on June 10, 2014

Large directories can be very slow to scan (readdir). By fanning out the items into more directories of fewer items, performance is improved. Git does something similar with its objects directory. This is an old Unix trick.

Note that Linux ext3/ext4 have a dir_index option which improves readdir performance.

Edit: it's not readdir performance so much as just name lookups which are improved by this technique (and dir_index): http://ext2.sourceforge.net/2005-ols/paper-html/node3.html

adrianmsmith · on June 10, 2014

My experience was the opposite. In code I wrote

- I never needed to scan the directory to find "all users" (after all, if you have millions of users, this is going to take a while whatever directory structure you use)

- Modern filesystems either use a tree or hash structure to identify files, meaning that lookup by name, and creating/deleting files, is quick, even if you have millions of files.

- Given no performance benefits are to be had by directory nesting, I always went with the option of simplicity, i.e. having everything in one directory.

(I blogged about this here: http://www.databasesandlife.com/flat-directories/)

But no doubt the HN developers had a reason for doing this change, I'd love to know what it is (e.g. if they need to do something I never needed to do, or if they need to do the same things but I was wrong.)

js2 · on June 10, 2014

I read your blog entry. Your experience was with tru64 and you also mention zfs. These and other file systems may indeed use data structures to make filename lookup performant.

But traditionally, ufs and ext2/3/4 (without dir_index) have to perform a linear scan through a linked list for lookup, and so they do indeed grow slower with number of files. This is likely where the fanout strategy originated from.

So as usual, YMMV and you should test on your file system of choice.

Personally, I don't really consider that fanout adds much complexity and I'd be surprised if it hurt performance.

edit: HN runs on FreeBSD. Not sure if they are using zfs or ufs, but I'm going to guess ufs. UFS apparently has a dirhash which improves directory lookups, but it's an in-memory structure so it won't help in the cold-cache case after reboot and it can be purged in low memory situations too.

https://wiki.freebsd.org/DirhashDynamicMemory

edit2: I wonder whether the HN admins ever tried tuning the dirhash settings? http://lists.freebsd.org/pipermail/freebsd-stable/2013-Augus...

kogir · on June 10, 2014

We used to run UFS and we tuned both the kernel itself and the dirhash settings. Now we run ZFS.

unreal37 · on June 10, 2014

The site loads 5 times faster. So clearly there are performance benefits to directory nesting.

thaumasiotes · on June 10, 2014

Sure, but that's the least helpful possible response you could have made. We've got an observation:

> I never needed to scan the directory to find "all users"

> lookup by name, and creating/deleting files, is quick, even if you have millions of files.

And a question: given these observations, where do the benefits of filesystem fanout come from? Is it not true that looking up a file by name is fast no matter how many other files sit in the same directory? Is HN doing something weird?

You can't answer the question "where do the performance benefits come from?" by saying "look, the performance benefits exist".

shawnz · on June 10, 2014

> You can't answer the question "where do the performance benefits come from?" by saying "look, the performance benefits exist".

I think he is trying to say is that the parent poster's observations must be wrong. After all, we are talking about an unsubstantiated claim ("there's no benefit to fanning out files") that directly contradicts another claim which we have data for ("HN is 5x faster after fanning out files").

thaumasiotes · on June 10, 2014

Again, when someone asks why they're wrong, it's not useful to tell them "but you're wrong". Parent poster already acknowledged that the combination of his ideas and the facts on the ground didn't make sense. What good does it do anyone to repeat it back to him?

unreal37 · on June 12, 2014

I guess when you're absolutely sure that you're right, but the observation proves you wrong, you have to be prepared to consider the possibility that you're wrong.

The comment I was replying to was saying that the file system takes care of it automatically, so there's no purpose to arranging millions of files into directories. I'm not going to speculate how it all works under the hood.

adrianmsmith · on June 10, 2014

But this wasn't all they changed, e.g. password storage is now different.

bhouston · on June 10, 2014

Large number of files in windows-based directories was killer. I can not remember the full details as it was years ago but once you went above 10,000 files or 20,000 files performance just died in Windows. It was because a bunch of the main API calls for accessing files in directories were inefficient I believe.

adrianmsmith · on June 10, 2014

I agree on Windows (esp when using a GUI), but HN does not host on Windows.

eCa · on June 10, 2014

An aside: When storing large quantities of things like this my personal preference is to split the id from the end.

I was once i charge of a large number of images of book jackets named by the books' isbn. At least in that population (which of course is an extreme example considering how an isbn is created) the distribution is much more even (that is, the directory sizes are relatively equal) when using the end than the beginning, but I would not be surprised if that is a normal outcome.

Maybe it's an application of Benford's law [1].

[1] http://en.wikipedia.org/wiki/Benford%27s_law

jacquesm · on June 10, 2014

Directory scans suck. So if you break up the space into sets of prefixes you limit the number of files in each dir and traversal gets much faster. Ditto adding/removing items.

Some filesystems are worse at this than others (xfs... let's not go there).

Pxtl · on June 10, 2014

Wasn't the big miracle of Reiser4 supposed to usher in an era where this was no longer a problem?

lmm · on June 10, 2014

Reiser3 is enough to avoid this issue; I still use it when I have to run linux.

Xorlev · on June 10, 2014

I think Hans killed that era.

yellowapple · on June 10, 2014

rimshot

adrianmsmith · on June 10, 2014

Would the HN software need to scan the directory e.g. to read in all users, at any point? I don't know the source code of HN but I can't see why that would be necessary.

(And if it did need to, presumably it's now need to recursively scan all sub-directories, which would also take a while?)

jacquesm · on June 10, 2014

No, you need to scan the directory every time you read a file. So most filesystems do a lot of work to optimize this but it is still a significant factor.

adrianmsmith · on June 10, 2014

Not sure I agree with that. I ran the software for community with 6M users, and on 2003 hardware we had millions of files in one directory. That was with advfs on tru64 so things might be different with other file systems. But e.g. zfs can do this no problem as well. I just sort of assumed other FSs must have caught up in the intervening 10+ years but I haven't looked into their source code so an prepared to admit I might be wrong about them.

The same way a database fetch doesn't load the whole table, filesystems can and do use trees and hashes to organize directories so that file lookup, creation and deletion by name can be fast and can be concurrent.

I posted this in another comment, but this was my understanding of the situation in 2010. http://www.databasesandlife.com/flat-directories/

There must be a reason why they did this change, either I am wrong about performance (perhaps my results really were particular to those filesystems) or perhaps I am right and they made the change for another reason. I'd like to learn the answer.

jacquesm · on June 10, 2014

I don't think that fact that advfs on tru64 did well is any evidence that other filesystems are not dealing with this poorly. I'm running an XFS filesystem right now that still totally sucks at this particular aspect but I'm loathe to move all the data off the machine, rebuild it all and then to move it back.

For one it would need a vast amount of temp space, the site would be down while doing it and the end result would be much the same as what it is today (I rarely modify the filesystem).

ddebernardy · on June 10, 2014

You'd want to do so to check for dup usernames, for instance.

oneeyedpigeon · on June 10, 2014

Someone will explain this better but, as I understand it, having a huge number of files in a single directory becomes inefficient at a certain number; organising those files hierarchically brings an improvement. If HN really stores each item as a file in one big directory, that directory used to contain almost 8 million files ...

leephillips · on June 10, 2014

Not only inefficient; some linux commands fail when they're used on more than a few million files at a time - there is a maximum number of arguments that they can handle.

EDIT: bzbarsky's explanation below is more accurate.

bzbarsky · on June 10, 2014

Typically the issue is not the commands themselves but the shell. Trying to do "something *" on the shell command line will expand out the glob, and if the resulting string is too long (e.g. you fail to malloc() it!) the shell will do something ranging from crashing to not running the command and giving a useful error message.

larrys · on June 10, 2014

Not claiming to be an expert here but I typically do this when confronted with a large number of files.

Instead of: command *

for i in [someregex]*

do

command $i

done

I know I could also do command [someregex]* but like the comfort of having each item echo back to the terminal so I know the progress.

andreasvc · on June 11, 2014

That still relies on the shell expanding a glob of millions of files. Another method is to use 'find' and 'xargs' to avoid specifying the files as arguments explicitly.

fennecfoxen · on June 10, 2014

This is a common technique to limit the number of files in any single directory when you store your date in many many files on disk. You will also see this in things like web cache software and the like.

VLM · on June 10, 2014

Thats what it is. The specific "why" is there aren't many O(1) directory traverses.

So a million files in a dir is going to take longer to access any individual file than if there's only 3 files to pick from.

And if it scales worse than linear, a tree structure, although hitting the FS multiple times once at each level, in total can take less time.

Finally if you can avoid a smooth distribution hash and intentionally order by something important (time?) then you only need a cache the most recent directories in memory and the deep historical archive can fend for itself rarely accessed without getting in the way of the busy files. If you rarely if ever leave /stuff/thisYear/today/ then whatever is in /stuff/2011/dec25 will never slow today down or get in the way.

zimpenfish · on June 10, 2014

If you look up "directory hashing" (possibly also "directory sharding"), that'll explain things. Particularly relevant to things like email servers and shared web hosts.