Re: Possible application issue ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



George Mitchell posted on Sun, 06 Apr 2014 22:25:03 -0700 as excerpted:

> I am seeming to have an issue with a specific application.  I just 
> installed "Recoll", a really nice desktop search tool.  And the 
> following day whenever my backup program would attempt to run, my 
> computer simply stopped dead in its tracks and I was forced to do a
> hard reboot to get it back.  So tonight I have been trying to shag out
> the  problem.  And the problem goes like this.  Whenever I try to
> defrag the  Recoll data files, I get a string of weird messages poring
> out from the btrfs defrag program itself and flashing messages on the
> screen  regarding some sort of CPU failure problem for both cpus.  As
> soon as I removed the ".recoll" data directory from the path,
> everything got OK.  

> Does anyone know what might be going on here or should I run the thing 
> and try to trap the output and post it and/or send a copy of the data 
> files in question?

Just a btrfs user and list regular here, not a dev, but...

You'll probably need to post the output for a bug fix... unless it's 
simply stalled for NNN seconds warnings (usually 30/60/90/120/etc), in 
which case the general case is known, but then you'll want to...

echo w > /proc/sysrq-trigger

...  and post the output from that.  That's the usually requested info 
from that case, anyway.  And if this is the case, the apparent lockup 
should go away on its own after some time, but it might be a few minutes 
if the files are very heavily fragmented, as is likely.


Meanwhile, database files are part of a general category of frequently 
internally updated (as opposed to append-only updated) files that all 
copy-on-write filesystems including btrfs have problems with as they tend 
to fragment very fast and hard on COW because rewrites are to new 
locations.

How large are the files in question?  Are you using the btrfs autodefrag 
mount option?  Do you do use snapper or otherwise do lots of (likely 
scripted) snapshots on that subvolume or filesystem?

Generally speaking, if the files aren't too large (perhaps a couple 
hundred MiB or smaller), btrfs' autodefrag option can usually deal with 
the fragmentation as it occurs.  This works quite well for firefox sqlite 
databases, for instance.

Once the files in question get over perhaps half a gigabyte in size, 
however, that doesn't work so well, particularly if the file is being 
updated at a reasonable speed in real-time, as autodefrag queues the 
entire file for rewrite in ordered to defrag it, and at some point the 
rewriting can't keep up with the updates coming in.

For large internal-rewrite-pattern files, there's the NOCOW extended file 
attribute, which tells btrfs to rewrite the files in place, and disables 
the usual checksumming and etc that can also take time and complicate 
things on database files where the database generally already has some 
file integrity management of its own, that can "fight" with the 
management btrfs does.

But to be effective, setting nocow (chattr +C /path/to/file/or/dir) needs 
to be done while the file is still zero size, before it has any content.  
The easiest way to do that is to set it on the directory, before the 
files in the directory are created, so they inherit the nocow attribute 
from the directory they're created in.

The easiest solution at this point might be to delete the current 
fragmented files instead of trying to defrag them, setup the nocow on the 
directory that will contain them, and then trigger a reindexing.


However, there's one additional caveat involving snapshots.  By 
definition, the first change to a fileblock after a snapshot will be copy-
on-write despite the nocow attribute.  This is because the snapshot froze 
the existing file data in place as it was, so a change to it must be 
written to a new location even if the file is set nocow.  This shouldn't 
be too big of a problem if you're just taking a snapshot manually every 
week or so, but if you're using snapper or a similar automated script to 
take hourly or even per-minute snapshots, the effect is likely to be 
nearly as bad as if the file wasn't set nocow in the first place!

If this is the case, creating dedicated subvolume for the directory 
containing these files is the best idea, since snapshots stop at subvolume 
boundaries, so as long as you're not snapshotting the subvolume, you can 
set nocow on directories and files within it and not have to worry about 
snapshot-based cow undermining your efforts.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux