On Thu, Dec 19, 2019 at 09:06:07PM +0100, David Sterba wrote: > On Wed, Dec 18, 2019 at 08:03:37PM -0600, Dennis Zhou wrote: > > > > This happened also when I deleted everything from the filesystem and ran > > > > full balance. > > > > Also were these both on fresh file systems so it seems reproducible for > > you? > > Yes the filesystem was freshly created before the test. > > No luck reproducing it, I tried to repeat the steps as before but the > timing must make a difference and the numbers always ended up as 0 > (bytes) 0 (extents). > > > > I'll report back if I continue having trouble reproing it. > > > > I spent the day trying to repro against ext/dzhou-async-discard-v6 > > without any luck... I've been running the following: > > > > $ mkfs.btrfs -f /dev/nvme0n1 > > $ mount -t btrfs -o discard=async /dev/nvme0n1 mnt > > $ cd mnt > > $ bash ../age_btrfs.sh . > > > > where age_btrfs.sh is from [1]. > > > > If I delete arbitrary subvolumes, sync, and then run balance: > > $ btrfs balance start --full-balance . > > It all seems to resolve to 0 after some time. I haven't seen a negative > > case on either of my 2 boxes. I've also tried unmounting and then > > remounting, deleting and removing more free space items. > > > > I'm still considering how this can happen. Possibly bad load of free > > space cache and then freeing of the block group? Because being off by > > just 1 and it not accumulating seems to be a real corner case here. > > > > Adding asserts in btrfs_discard_update_discardable() might give us > > insight to which callsite is responsible for going below 0. > > Yeah more asserts would be good. I'll add a few assert patches and some code to ensure that life can still move on properly if we do hit the -1 case. I think it probably has something to do with free space cache removal as it can't be a simple corner case, otherwise we'd see the -1 accumulating much more easily. What does puzzle me is it's a single nodesize that I'm off by and not some other random number. Thanks, Dennis
