On Tue, Sep 02, 2014 at 05:20:29AM +0000, Duncan wrote: > suspect your firmware is SERIOUSLY out of space and shuffling, as that'll > slow the balance down too, and again after), try running fstrim on the > device. It may or may not work on that device, but if it does and the > firmware /was/ out of space and having to shuffle hard, it could improve > performance *DRAMATICALLY*. The reason being that on devices where it > works, fstrim will tell the firmware what blocks are free, allowing it > more flexibility in erase-block shuffling. > > If that makes a big difference, you can /try/ the discard mount option. > Tho doing the trim/discard as part of normal operations can slow them > down some too. The alternative would be to simply run fstrim > periodically, perhaps every Nth rsync or some such. Note that as the > fstrim manpage says, the output of fstrim run repeatedly will be the > same, since it only knows what areas are candidates to trim, not which > ones are already trimmed, but it shouldn't hurt the device any to > repeatedly fstrim it, and if you do it every N rsyncs, it should keep > things from getting too bad again. Note that dm-crypt does not pass discards to the underlying block device by default for security reasons (john didn't mention the dm-crypt options he was using). cryptsetup has the --allow-discards option, /etc/crypttab has the discard option to enable this. I've seen hung task timeouts on several filesystems under 3.14.17 and 3.15.8-9 (mostly on spinning disks with dm-crypt and lvm2 underneath, but sometimes without either). I adjusted kernel.hung_task_timeout_secs from 120 to 960 and started running balances regularly, which helps mitigate this problem, but not eliminate it (ironically, when a balance is resumed at boot, it's usually one of the hung tasks in the kernel log). A fairly good way to see this is to run 'btrfs fi defrag' on large files, 'btrfs balance' with large extents on the filesystem, or write a big file quickly (1GB+ in <30 sec). If a filesystem is more than 90% full and free space is heavily fragmented (especially by rolling snapshots), allocating large contiguous areas seems to take a long time, and it seems to block some or all other allocations at the same time (I haven't rigorously identified these, but it seems to include everything that calls fsync() or performs certain metadata operations). The writes usually do finish in a few minutes, but write latency (measured by timing a 'mkdir' call at regular intervals) can spike as high as 9+ hours. Most people (and watchdog robots) are reaching for the RESET button in less than five minutes. :-/
Attachment:
signature.asc
Description: Digital signature
