Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This of course before saying anything about transactional safety of writing directly to the filesystem

You do realize that rename(2), open(2) with O_CREAT | O_EXCL, mkdir(2) and still other POSIX filesystem operations are fully atomic, right?

http://rcrowley.org/2010/01/06/things-unix-can-do-atomically...



See followup comment below. You're confusing logical atomicity for physical consistency and durability: yes, these operations may have certain atomicity guarantees from perspective of the application, but they are entirely asynchronous from the perspective of the storage medium unless you explicitly fsync(), and for example on Linux, even then the default behaviour of ext4 is to allow metadata updates to complete prior to data updates (no "write barrier").

In other words:

1. fd = open("super-safe-file.tmp", O_CREAT|O_RDWR);

2. write(fd, "super-safe-data", 15);

3. close(fd);

4. rename("super-safe-file.tmp", "super-safe-file");

5. (kernel flushes file and directory metadata to disk)

6. CRASH

7. Machine reboots, "super-safe-data" exists, but no longer contains any data, since file data itself was never flushed.

8. Tears are shed, programmers are fired, backups are restored


I understand the problem scenario with ext3/ext4 journalling you're referring to here and below.

However, HN runs on FreeBSD, and my understanding is that the combination of soft-updates + journalling there actually do provide atomic rename, even in the case of catastrophic failure. McKusick talks about it here: http://www.mckusick.com/softdep/suj.pdf

Also, just to anchor the discussion a bit, the HN code does use the "write foo.tmp; mv foo.tmp foo" trick all over the place. (Or at least, the most recent version of news.arc I've seen does.)

https://github.com/wting/hackernews/blob/master/arc.arc#L841


You said POSIX, which makes no such guarantee.. soft updates are cool, though as far as I know they still don't provide durability. Still, that's far better than the default Linux behaviour


And this is one of many reasons why you should use ZFS for your data. ZFS guarantees the atomicity of renames and would not have this problem. On Solaris and FreeBSD at least. I don't know about ZFS on Linux.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: