write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 3, 2020 at 10:38 PM Zygo Blaxell
<ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Jan 02, 2020 at 04:22:37PM -0700, Chris Murphy wrote:

> > I've seen with 16KiB leaf size, often small files that could be
> > inlined, are instead put into a data block group, taking up a minimum
> > 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect
> > there just isn't enough room in that leaf to always use inline
> > extents, and yet there is enough room to just reference it as a data
> > block group extent. When using a larger node size, a larger percentage
> > of small files ended up using inline extents. I'd expect this to be
> > quite a bit more efficient, because it eliminates a time expensive (on
> > HDD anyway) seek.
>
> Putting a lot of inline file data into metadata pages makes them less
> dense, which is either good or bad depending on which bottleneck you're
> currently hitting.  If you have snapshots there is an up-to-300x metadata
> write amplification penalty to update extent item references every time
> a shared metadata page is unshared.  Inline extents reduce the write
> amplification.  On the other hand, if you are doing a lot of 'find'-style
> tree sweeps, then inline extents will reduce their efficiency because more
> pages will have to be read to scan the same number of dirents and inodes.

Egads! Soo... total tangent. I'll change the subject.

I have had multiple flash drive failures while using Btrfs: all
Samsung, several SD Cards, and so far two USB sticks. They all fail in
the essentially the same way, the media itself becomes read only. USB:
writes succeed but they do not persist. Write data to the media, and
there is no error. Read that same sector back, old data is there. SD
Card: writes fail with a call trace and diagnostic info unique to the
sd card kernel code, and everything just goes belly up. This happens
inside of 6 months of rather casual use as rootfs. And BTW Samsung
always replaces the media under warranty without complaint.

It's not a scientific sample. Could be the host device, which is the
same in each case. Could be a bug in the firmware. I have nothing to
go on really.

But I wonder if this is due to write amplification that's just not
anticipated by the manufacturers? Is there any way to test for this or
estimate the amount of amplification? This class of media doesn't
report LBA's written, so I'm at quite a lack of useful information to
know what the cause is. The relevance here though is, I really like
the idea of Btrfs used as a rootfs for things like IoT because of
compression, ostensibly there are ssd optimizations, and always on
checksumming to catch what often can be questionable media: like USB
sticks, SD Cards, eMMC, etc. But not if the write amplication has a
good chance of killing people's hardware (I have no proof of this but
now I wonder, as I read your email).

I'm aware of write amplification, I just didn't realize it could be
this massive. It's is 300x just by having snapshots at all? Or does it
get worse with each additional snapshot? And is it multiplicative or
exponentially worse?

In the most prolific snapshotting case, I had two subvolumes, each
with 20 snapshots (at most). I used default ssd mount option for the
sdcards, most recently ssd_spread with the usb sticks. And now nossd
with the most recent USB stick I just started to use.

-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux