On 23.10.19 г. 16:57 ч., Qu Wenruo wrote: > [BUG] > When deleting large files (which cross block group boundary) with discard > mount option, we find some btrfs_discard_extent() calls only trimmed part > of its space, not the whole range: > > btrfs_discard_extent: type=0x1 start=19626196992 len=2144530432 trimmed=1073741824 ratio=50% > > type: bbio->map_type, in above case, it's SINGLE DATA. > start: Logical address of this trim > len: Logical length of this trim > trimmed: Physically trimmed bytes > ratio: trimmed / len > > Thus leading some unused space not discarded. > > [CAUSE] > When discard mount option is specified, after a transaction is fully > committed (super block written to disk), we begin to cleanup pinned > extents in the following call chain: > > btrfs_commit_transaction() > |- write_all_supers() > |- btrfs_finish_extent_commit() > |- find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY); > |- btrfs_discard_extent() > > However pinned extents are recorded in an extent_io_tree, which can > merge adjacent extent states. > > When a large file get deleted and it has adjacent file extents across > block group boundary, we will get a large merged range. > > Then when we pass the large range into btrfs_discard_extent(), > btrfs_discard_extent() will just trim the first part, without trimming > the remaining part. > > Furthermore, this bug is not that reliably observed, as if the whole > block group is empty, there will be another trim for that block group. > > So the most obvious way to find this missing trim needs to delete large > extents at block group boundary without empting involved block groups. > > [FIX] > - Allow __btrfs_map_block_for_discard() to modify @length parameter > btrfs_map_block() uses its @length paramter to notify the caller how > many bytes are mapped in current call. > With __btrfs_map_block_for_discard() also modifing the @length, > btrfs_discard_extent() now understands when to do extra trim. > > - Call btrfs_map_block() in a loop until we hit the range end > Since we now know how many bytes are mapped each time, we can iterate > through each block group boundary and issue correct trim for each > range. > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> Reviewed-by: Nikolay Borisov <nborisov@xxxxxxxx> Tested-by: Nikolay Borisov <nborisov@xxxxxxxx>
