Re: USB reset + raid6 = majority of files unreadable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 4, 2020 at 5:45 AM Steven Fosdick <stevenfosdick@xxxxxxxxx> wrote:
>
> > The most obvious case of corruption is a checksum mismatch (the
> > onthefly checksum for a node/leaf/block compared to the recorded
> > checksum). Btrfs always reports this.
>
> And it did, but only for the relocation tree that was being built as
> part of the balance.  I am sure you or Qu said in a previous e-mail
> that this is a temporary structure only built during that operation so
> should not have been corrupted by previous problems.  As no media
> errors were logged either that must surely mean that either there is a
> bug in constructing the tree or corrupted data was being copied from
> elsewhere into the tree and only detected after that copy rather than
> before.

I'm not familiar enough with data relocation tree, all I can do is
wild speculation: It could be the reported corruption, which might
just be reporting noise, is a consequence of the stalled/failed device
removal, and that the actual problem remains obscured.


>
> > So that leaves the less obvious cases of corruption where some
> > metadata or data is corrupt in memory, and a valid checksum is
> > computed on already corrupt data/metadata, and then written to disk.
>
> But if the relocation tree is constructed during the balance operation
> rather than being a permanent structure then the chance of flipped
> bits in memory corrupting it on successive attempts is surely very
> small indeed.

Probably true.

> > I don't understand the question. The device replace command includes
> > 'device add' and 'device remove' in one step, it just lacks the
> > implied resize that happens with add and remove.
>
> When i did the add and remove separately, the add succeeded and the
> remove failed (initially) having moved very little data.  If that were
> to happen with those same steps within a replace would it simply stop
> where it found the problem, leaving the new device added and the old
> one not yet removed, or would it try to back out the whole operation?

Yeah the replace code has its own ioctl in the kernel. So it's not
entirely fair to refer to it as a mere shortcut of the add then remove
method.

First data is copied from source to new target, the copy reuses scrub
code, and the new target isn't actually "added" until the very end of
the process. During the copy, new blocks are written to both source
and destination devices. Only once replication is definitely
successful is the new device really added, and the old device removed.
Up to the point where the two are swapped out, the source device is
not in a "being removed" state like the add then remove method.

The device add then remove method takes a while, involves resize and
balance code, and is migrating chunks on the source to other devices.
In the case of raid5 it means restriping all devices. Every device is
reading and writing. It's a lot more expensive than just replacing.


-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux