On Tue, Dec 17, 2019 at 03:55:41PM +0100, David Sterba wrote: > On Fri, Dec 13, 2019 at 04:22:09PM -0800, Dennis Zhou wrote: > > Hello, > > > > Dave reported a lockdep issue [1]. I'm a bit surprised as I can't repro > > it, but it obviously is right. I believe I fixed the issue by moving the > > fully trimmed check outside of the block_group lock. I mistakingly > > thought the btrfs_block_group lock subsumed btrfs_free_space_ctl > > tree_lock. This clearly isn't the case. > > > > Changes in v6: > > - Move the fully trimmed check outside of the block_group lock. > > v6 passed fstests, with some weird test failures that don't seem to be > related to the patchset. Yay! > > Meanwhile I did manual test how the discard behaves. The workload was > a series of linux git checkouts of various release tags (ie. this should > provide some freed extents and coalesce them eventually to get larger > chunks to discard), then a simple large file copy, sync, remove, sync. > > The discards going down to the device were followin the maximum default > size (64M) but I observed that only one range was discarded per 10 > seconds, while the other stats there are many more extents to discard. > > For the large file it took like 5-10 cycles to send all the trimmed > ranges, the discardable_extents decreased by one each time until it > reached ... -1. At this point the discardable bytes were -16384, so > thre's some accounting problem. > > This happened also when I deleted everything from the filesystem and ran > full balance. > Oh no :(. I've been trying to repro with some limited checking out and syncing, then subsequently removing everything and calling balance. It is coming out to be 0 for me. I'll try harder to repro this and fix it. > Regarding the slow io submission, I tried to increase the iops value, > default was 10, but 100 and 1000 made no change. Increasing the maximum > discard request size to 128M worked (when there was such long extent > ready). I was expecting a burst of like 4 consecutive IOs after a 600MB > file is deleted. I did not try to tweak bps_limit because there was > nothing to limit. > Ah so there's actually a max time between discards set to 10 seconds as the maximum timeout is calculated over 6 hours. So if we only have 6 extents, we'd discard 1 per hour(ish given it decays), but this is clamped to 10 seconds. At least on our servers, we seem to discard at a reasonable rate to prevent performance penalties during a large number of reads and writes while maintaining reasonable write amplification performance. Also, metadata blocks aren't tracked, so on freeing of a whole metadata block group (minus relocation), we'll trickle discards slightly slower than expected. > So this is something to fix but otherwise the patchset seems to be ok > for adding to misc-next. Due to the timing of the end of the year and > that we're already at rc2, this will be the main feature in 5.6. I'll report back if I continue having trouble reproing it. Thanks v5.6 sounds good to me! Dennis
