George Mitchell posted on Sun, 06 Apr 2014 22:25:03 -0700 as excerpted: > I am seeming to have an issue with a specific application. I just > installed "Recoll", a really nice desktop search tool. And the > following day whenever my backup program would attempt to run, my > computer simply stopped dead in its tracks and I was forced to do a > hard reboot to get it back. So tonight I have been trying to shag out > the problem. And the problem goes like this. Whenever I try to > defrag the Recoll data files, I get a string of weird messages poring > out from the btrfs defrag program itself and flashing messages on the > screen regarding some sort of CPU failure problem for both cpus. As > soon as I removed the ".recoll" data directory from the path, > everything got OK. > Does anyone know what might be going on here or should I run the thing > and try to trap the output and post it and/or send a copy of the data > files in question? Just a btrfs user and list regular here, not a dev, but... You'll probably need to post the output for a bug fix... unless it's simply stalled for NNN seconds warnings (usually 30/60/90/120/etc), in which case the general case is known, but then you'll want to... echo w > /proc/sysrq-trigger ... and post the output from that. That's the usually requested info from that case, anyway. And if this is the case, the apparent lockup should go away on its own after some time, but it might be a few minutes if the files are very heavily fragmented, as is likely. Meanwhile, database files are part of a general category of frequently internally updated (as opposed to append-only updated) files that all copy-on-write filesystems including btrfs have problems with as they tend to fragment very fast and hard on COW because rewrites are to new locations. How large are the files in question? Are you using the btrfs autodefrag mount option? Do you do use snapper or otherwise do lots of (likely scripted) snapshots on that subvolume or filesystem? Generally speaking, if the files aren't too large (perhaps a couple hundred MiB or smaller), btrfs' autodefrag option can usually deal with the fragmentation as it occurs. This works quite well for firefox sqlite databases, for instance. Once the files in question get over perhaps half a gigabyte in size, however, that doesn't work so well, particularly if the file is being updated at a reasonable speed in real-time, as autodefrag queues the entire file for rewrite in ordered to defrag it, and at some point the rewriting can't keep up with the updates coming in. For large internal-rewrite-pattern files, there's the NOCOW extended file attribute, which tells btrfs to rewrite the files in place, and disables the usual checksumming and etc that can also take time and complicate things on database files where the database generally already has some file integrity management of its own, that can "fight" with the management btrfs does. But to be effective, setting nocow (chattr +C /path/to/file/or/dir) needs to be done while the file is still zero size, before it has any content. The easiest way to do that is to set it on the directory, before the files in the directory are created, so they inherit the nocow attribute from the directory they're created in. The easiest solution at this point might be to delete the current fragmented files instead of trying to defrag them, setup the nocow on the directory that will contain them, and then trigger a reindexing. However, there's one additional caveat involving snapshots. By definition, the first change to a fileblock after a snapshot will be copy- on-write despite the nocow attribute. This is because the snapshot froze the existing file data in place as it was, so a change to it must be written to a new location even if the file is set nocow. This shouldn't be too big of a problem if you're just taking a snapshot manually every week or so, but if you're using snapper or a similar automated script to take hourly or even per-minute snapshots, the effect is likely to be nearly as bad as if the file wasn't set nocow in the first place! If this is the case, creating dedicated subvolume for the directory containing these files is the best idea, since snapshots stop at subvolume boundaries, so as long as you're not snapshotting the subvolume, you can set nocow on directories and files within it and not have to worry about snapshot-based cow undermining your efforts. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
