filesystem corruption: closed

Today our upstream, FreeBSD accepted our patch to fix the corruption / truncation issue we identified. Some additional details are here and here. In particular, the details on the second link show how we went about recreating the issue, and then testing it to ensure that the bug is really fixed.

It’s taken a few months to first reproduce, then fix the issue. After we had identified the cause, I wrote to Kirk McKusick, who knows UFS better than anyone. Kirk explained the situation thus:

What is happening is that the files in question are being truncated then rewritten with new contents. SU ensures that after the truncation they will either show the correct new result or be zero length. Absent SU they can show up claiming the unwritten blocks which is why you see random data. Marking the filesystems sync should fix the problem as you will not have the (up to) two minute gap between the write and the data being flushed to disk.

Indeed, mounting the filesystem “sync” does fix the issue, which is why we made the change to pfSense 2.2.3. We knew that we needed more time to test before pronouncing the issue fixed without impacting performance. It’s better to have a safe, stable system than one that can corrupt itself on the next reboot.

Many applications write or re-write configuration files or other files that are critical for the operation of the system after a reboot (due to power failure or just a reboot.) Applications that are properly written take these steps to ensure stability of the system:

  1. write the new file to a temporary name.
  2. fsync newly written file, (or mark the descriptor for direct I/O before writing.)
  3. close the file.
  4. rename temporary file to the file that is being updated.

To significantly close the window, you can fsync the enclosing the directory. For master.passwd(5), group(5), pwd.db and spwd.db, the enclosing directory is /etc. So this is exactly what the patch does for both the libc routines that access the group and master.passwd file, and the pwd_mkdb(8) command, which generates pwd.db and spwd.db.

The rest of the patch ensures that master.passwd(5) is always opened such that writes to it are written to the underlying media, and fixes a bug we noted in the pw_util(3) man page. Similar patches are being developed for cap_mkdb(1) and services_mkdb(8).

I will note that we have tested the result of these patches on filesystems with and without soft updates (SU), as well as soft-updates with journaling (SU+J), and all meet with success.

I’d like to offer thanks to the team here (Renato, Jim Pingle, Chris, Matt, SteveW) as well as Luiz Souza (loos@), George Neville-Neil (gnn@), Baptiste Daroussin (bapt@), and Kirk McKusick, all of whom provided assistance.

Someone on a project which forked pfSense claimed only yesterday:

We’ve discussed this a couple more times internally and have come to the conclusion that this issue is not fixable, or at least not in the way it has been presented and discussed. While it’s true that “sync” completely circumvents the issue, it seems that UFS has gotten a lot more error prone in FreeBSD 10 because of a yet to be discovered regression. We do not intend to switch our installs to “sync” or use journaling on top of soft updates.

Some free advice: If you don’t understand the system, don’t attempt to disguise your lack of knowledge with infantile rambling, and anyone who thinks ext2 is an appropriate primary filesystem for FreeBSD has questionable motives and poor taste.