-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 6/11/15 3:35 PM, Chris Mason wrote: > On 06/11/2015 03:27 PM, Jeff Mahoney wrote: >> On 6/11/15 3:24 PM, Chris Mason wrote: >>> On 06/11/2015 03:15 PM, Jeff Mahoney wrote: >>>> On 6/11/15 2:44 PM, Filipe David Manana wrote: >>>>> On Thu, Jun 11, 2015 at 7:17 PM, Jeff Mahoney >>>>> <jeffm@xxxxxxxx> wrote: On 6/11/15 12:47 PM, Filipe David >>>>> Manana wrote: >>>>>>>> On Thu, Jun 11, 2015 at 4:20 PM, <jeffm@xxxxxxxx> >>>>>>>> wrote: >>>>>>>>> From: Jeff Mahoney <jeffm@xxxxxxxx> >>>>>>>>> >>>>>>>>> Btrfs doesn't track superblocks with extent >>>>>>>>> records so there is nothing persistent on-disk to >>>>>>>>> indicate that those blocks are in use. We track >>>>>>>>> the superblocks in memory to ensure they don't get >>>>>>>>> used by removing them from the free space cache >>>>>>>>> when we load a block group from disk. Prior to >>>>>>>>> 47ab2a6c6a (Btrfs: remove empty block groups >>>>>>>>> automatically), that was fine since the block group >>>>>>>>> would never be reclaimed so the superblock was >>>>>>>>> always safe. Once we started removing the empty >>>>>>>>> block groups, we were protected by the fact that >>>>>>>>> discards weren't being properly issued for unused >>>>>>>>> space either via FITRIM or -odiscard. The block >>>>>>>>> groups were still being released, but the blocks >>>>>>>>> remained on disk. >>>>>>>>> >>>>>>>>> In order to properly discard unused block groups, >>>>>>>>> we need to filter out the superblocks from the >>>>>>>>> discard range. Superblocks are located at fixed >>>>>>>>> locations on each device, so it makes sense to >>>>>>>>> filter them out in btrfs_issue_discard, which is >>>>>>>>> used by both -odiscard and FITRIM. >>>>>>>>> >>>>>>>>> Signed-off-by: Jeff Mahoney <jeffm@xxxxxxxx> --- >>>>>>>>> fs/btrfs/extent-tree.c | 50 >>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++------ >>>>>>>>> 1 file changed, 44 insertions(+), 6 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/fs/btrfs/extent-tree.c >>>>>>>>> b/fs/btrfs/extent-tree.c index 0ec3acd..75d0226 >>>>>>>>> 100644 --- a/fs/btrfs/extent-tree.c +++ >>>>>>>>> b/fs/btrfs/extent-tree.c @@ -1884,10 +1884,47 @@ >>>>>>>>> static int remove_extent_backref(struct >>>>>>>>> btrfs_trans_handle *trans, return ret; } >>>>>>>>> >>>>>>>>> -static int btrfs_issue_discard(struct block_device >>>>>>>>> *bdev, - u64 start, u64 len) +#define in_range(b, >>>>>>>>> first, len) ((b) >>>>>>>>>> = (first) && (b) < (first) + (len)) >>>>>>>> >>>>>>>> Hi Jeff, >>>>>>>> >>>>>>>> So this will work if every caller behaves well and >>>>>>>> passes a region whose start and end offsets are a >>>>>>>> multiple of the sector size (4096) which currently >>>>>>>> matches the superblock size. >>>>>>>> >>>>>>>> However, I think it would be safer to check for the >>>>>>>> case where the start offset of a superblock mirror >>>>>>>> is < (first) and (sb_offset + sb_len) > (first). >>>>>>>> Just to deal with cases where for example the 2nd >>>>>>>> half of the sb starts at offset (first). >>>>>>>> >>>>>>>> I guess this sectorsize becoming less than 4096 will >>>>>>>> happen sooner or later with the subpage sectorsize >>>>>>>> patch set, so it wouldn't hurt to make it more >>>>>>>> bullet proof already. >>>> >>>>> Is that something anyone intends to support? While I >>>>> suppose the subpage sector patch /could/ be used to allow >>>>> file systems with a node size under 4k, the intention is >>>>> the other way around -- systems that have higher order >>>>> page sizes currently don't work with btrfs file system >>>>> created on systems with smaller order page sizes like x86. >> >>> The best use of smaller node sizes is just to test the >>> subpagesize patches on more common hardware. I wouldn't >>> expect anyone to use a 1K node size in production. >> >> Any chance we can enforce that? Like with a compile-time >> option? :) > > We can make mkfs.btrfs advise strongly against it ;) > > But, since I wasn't horribly clear, I'd love one extra if > statement in the discard function. Silently eating bytes is > horribly hard to track down. Heh, yeah. I'm making it bulletproof now. If the goal is to also catch potential misbehavior, I'm catching some other cases as well. A few extra conditionals will still take a small percentage of the time a discard takes. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.19 (Darwin) iQIcBAEBAgAGBQJVeeWDAAoJEB57S2MheeWygMEQAMPCNf8ZIMfYRDkzbpW0mezB 6Vbu7PM5WNAqOU2XdJXq47Z+jvLzsbBG0Z1hDLdavkQiOfjOQeBDvwVQQwFPizJ9 lRA4HB6P0nMKVl4x4PcXzgRinrIIy46nFY7VFZBe/cO0aEq7bsB3/vjlRj4LKvsp eeMg212Sc4V6yuVbSfLSgYTtMGcAsmE9rUWl+2+kV6aTGqZr72YG1033YVu9Y+0F vnelEIKFSmYF1y7FqO8Ejpk7G6fOoKYXGIxjcyC5v6kAKygZkxuUFYt9wPgpxl4X eTYnPwjRwE3qRHlZtCGmb0SKvIkFMeKaI5Dy8KXUSHu6Q4NZ8q+kftgzNTGHcEzD EgGrsbMa3N6necDYsmKYrIWVq21Nj2vSZc7YmLDKYtVQJRH2ScPOvHQlosEX8JsA h4DfSp8fLVWu8hAORrUvByrGfw7DkFOlv1bF4B76MokP7sb4ITnpBUJtW+0Uiw3x n1OJ94RiFOXpxWvEYquZUnK/9k1cg/eCwDpaFTCSDrTOVfW78lnoso1VKhQ1CJLg Ed3I77RA0jPE004hpwtLdGE2AMiOZfAMKTAPkErnnWMfcrBh9O8DUBWVXds3IBSg mv6lKPz24P28ymOINkqFC22D1vyXBH4Xiel0ZuPHHjnrxPUwovrF//XRbwcc7lCf jzsGyTnEnAf00/R8s7sP =v4r5 -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
