-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 6/11/15 2:44 PM, Filipe David Manana wrote: > On Thu, Jun 11, 2015 at 7:17 PM, Jeff Mahoney <jeffm@xxxxxxxx> > wrote: On 6/11/15 12:47 PM, Filipe David Manana wrote: >>>> On Thu, Jun 11, 2015 at 4:20 PM, <jeffm@xxxxxxxx> wrote: >>>>> From: Jeff Mahoney <jeffm@xxxxxxxx> >>>>> >>>>> Btrfs doesn't track superblocks with extent records so >>>>> there is nothing persistent on-disk to indicate that those >>>>> blocks are in use. We track the superblocks in memory to >>>>> ensure they don't get used by removing them from the free >>>>> space cache when we load a block group from disk. Prior to >>>>> 47ab2a6c6a (Btrfs: remove empty block groups >>>>> automatically), that was fine since the block group would >>>>> never be reclaimed so the superblock was always safe. >>>>> Once we started removing the empty block groups, we were >>>>> protected by the fact that discards weren't being properly >>>>> issued for unused space either via FITRIM or -odiscard. >>>>> The block groups were still being released, but the blocks >>>>> remained on disk. >>>>> >>>>> In order to properly discard unused block groups, we need >>>>> to filter out the superblocks from the discard range. >>>>> Superblocks are located at fixed locations on each device, >>>>> so it makes sense to filter them out in >>>>> btrfs_issue_discard, which is used by both -odiscard and >>>>> FITRIM. >>>>> >>>>> Signed-off-by: Jeff Mahoney <jeffm@xxxxxxxx> --- >>>>> fs/btrfs/extent-tree.c | 50 >>>>> ++++++++++++++++++++++++++++++++++++++++++++------ 1 file >>>>> changed, 44 insertions(+), 6 deletions(-) >>>>> >>>>> diff --git a/fs/btrfs/extent-tree.c >>>>> b/fs/btrfs/extent-tree.c index 0ec3acd..75d0226 100644 --- >>>>> a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ >>>>> -1884,10 +1884,47 @@ static int >>>>> remove_extent_backref(struct btrfs_trans_handle *trans, >>>>> return ret; } >>>>> >>>>> -static int btrfs_issue_discard(struct block_device *bdev, >>>>> - u64 start, u64 len) +#define in_range(b, first, len) >>>>> ((b) >>>>>> = (first) && (b) < (first) + (len)) >>>> >>>> Hi Jeff, >>>> >>>> So this will work if every caller behaves well and passes a >>>> region whose start and end offsets are a multiple of the >>>> sector size (4096) which currently matches the superblock >>>> size. >>>> >>>> However, I think it would be safer to check for the case >>>> where the start offset of a superblock mirror is < (first) >>>> and (sb_offset + sb_len) > (first). Just to deal with cases >>>> where for example the 2nd half of the sb starts at offset >>>> (first). >>>> >>>> I guess this sectorsize becoming less than 4096 will happen >>>> sooner or later with the subpage sectorsize patch set, so it >>>> wouldn't hurt to make it more bullet proof already. > > Is that something anyone intends to support? While I suppose the > subpage sector patch /could/ be used to allow file systems with a > node size under 4k, the intention is the other way around -- > systems that have higher order page sizes currently don't work with > btrfs file system created on systems with smaller order page sizes > like x86. Btrfs already has high enough metadata overhead. Pretty > much all new hardware has, at least, a native 4k sector size even > if it's abstracted behind a RMW layer. The sectors are only going > to get larger. With the metadata overhead that btrfs already > incurs, I can't imagine any production use case with smaller sector > sizes. > > Are we looking to support <4k nodes to test the subpage sector code > on x86? If so, then I'll change this to handle the possibility of > superblocks crossing sector boundaries. Otherwise, it's > protecting against a use case that just shouldn't happen. > >> I understand your point. I'm probably being too paranoid. But >> it's exactly because it's not supposed to happen that at least an >> assertion or something should be added imho. A lot of "not >> supposed not happen things" happen often, and that's often how >> people lose data, and get into other bad issues. > >> And I think I've heard once of supporting <4k nodes (sectorsizes) >> for testing at least on x86 for e.g, but I might have not >> understood it correctly. Having such a check would help detect >> bugs during development where some caller passes a wrong range to >> discard - better to find it during development/RCs rather than in >> production. Yeah, you're right. I'll make it more bulletproof. I'm just looking to be done with this particular mess. :) - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.19 (Darwin) iQIcBAEBAgAGBQJVed5SAAoJEB57S2MheeWypa4QAJIrhdZNg3d1I972dkkBBFjk 723lHLXWdHkJifQKYa7b91hwtWplKxZlTuEoc7/AwacoemR7m6V3JMApykHDgHrV mmHrUvcJVUFjhbn1vXkeMhbz8COLbOUIUF6/hrwzkg00SSDwtjmv5IJQEIXikKEo j/iI2IyyNsA2JgCa5R/2JvWRssWUF8VhupLw/T6WOfjTwhEEDT/DqZIuD/avGrwk cxo8NqUkUNrFA7JxhO9TS/Mz9IIxiLu75XQiFzHF3IkLGUGAT4R1JyRMSZhFNjgc MPS9d9V1ZUZ5swKSRWR07LDtBntiRDYS7atChLsWrOuJs3lsyS3nC4LFkbcqkFdX pLFOsftfe/FX3OGz5Jry9LyKjr0m7hjveZQ2hnEHB1D/wtpEhRw8WQc8cpL6/Fyr 11g67vsdMbb+KeWsOSLyNk47oRuqULSoqRWRBqeyjfXVRPUBAHME31mZ3QN8+wJz 7Ua6xYoKV5RT8A3w2ceVEhpwqY/DuSGKJ/frteFyLiy12myZz4ZutKLIxwvww4uT sP3w3NnXFtf3UdA1D0hgm5PQuDk5vBtrFJ1f78nkMXHoKgbIfoSF/LE3Zxb7SFw7 1GscISYtTHumXQdiv6bPw0rvNt1NDj0vKwJG6aRf9IDukT+05a37e33SLBGHGRbv tXxOaVLT67hJK7jQIWAM =yNKP -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
