On 19/05/16 02:33, Qu Wenruo wrote: > > > Graham Cobb wrote on 2016/05/18 14:29 +0100: >> A while ago I had a "no space" problem (despite fi df, fi show and fi >> usage all agreeing I had over 1TB free). But this email isn't about >> that. >> >> As part of fixing that problem, I tried to do a "balance -dusage=20" on >> the disk. I was expecting it to have system impact, but it was a major >> disaster. The balance didn't just run for a long time, it locked out >> all activity on the disk for hours. A simple "touch" command to create >> one file took over an hour. > > It seems that balance blocked a transaction for a long time, which makes > your touch operation to wait for that transaction to end. I have been reading volumes.c. But I don't have a feel for which transactions are likely to be the things blocking for a really long time (hours). If this can occur, I think the warnings to users about balance need to be extended to include this issue. Currently the user mode code warns users that unfiltered balances may take a long time, but it doesn't warn that the disk may be unusable during that time. >> 3) My btrfs-balance-slowly script would work better if there was a >> time-based limit filter for balance, not just the current count-based >> filter. I would like to be able to say, for example, run balance for no >> more than 10 minutes (completing the operation in progress, of course) >> then return. > > As btrfs balance is done in block group unit, I'm afraid such thing > would be a little tricky to implement. It would be really easy to add a jiffies-based limit into the checks in should_balance_chunk. Of course, this would only test the limit in between block groups but that is what I was looking for -- a time-based version of the current limit filter. On the other hand, the time limit could just be added into the user mode code: after the timer expires it could issue a "balance pause". Would the effect be identical in terms of timing, resources required, etc? Would it be better to do a "balance pause" or a "balance cancel"? The goal would be to suspend balance processing and allow the system to do something else for a while (say 20 minutes) and then go back to doing more balance later. What is the difference between resuming a paused balance compared to starting a new balance? Bearing in mind that this is a heavily used disk so we can expect lots of transactions to have happened in the meantime (otherwise we wouldn't need this capability)? Graham -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
