Re: How do damaged root trees happen and how to protect against power cut?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/3/19 下午11:14, Carsten Behling wrote:
> Hi,
> 
> the investigation of damaged root trees are already discussed in the
> thread starting with
> 
> https://www.spinics.net/lists/linux-btrfs/msg74019.html
> 
> However, one point wasn't discussed at the end:
> 
>> I thought so too. Is there a reason why they ended up being colocated?
>> I'm surprised with all the redundancies btrfs is capable of, this can
>> happen. Was it because the volume was starting to become full? (This
>> whole exercise of turning on mirroring was because we're migrating to
>> bigger disks)
> 
> Because I have the same issue on an embedded system, after a power
> cut, where none of the root tree copies are usable anymore, I'd also
> like to know :
> 
> - How can we end up in that recoverable state?

There are two main reasons:
- Btrfs bug
  The most recent one is between v5.2.0~v5.2.14.
  There may be some more in older kernels.

- Bad storage stack below btrfs
  The critical part is the FLUSH/FUA behavior.
  The spec requires FLUSH/FUA return after all data is written to
  storage or non volatile cache.

  Btrfs heavily depends on metadata COW to keep it corruption free
  against power loss.
  If FLUSH/FUA is not working correctly, then btrfs is completely
  doomed.

> - Why can't we protect the fs against the unrecoverable state?

If it's hardware, we have no way to protect.

> - Why is that error is so hard to recover?

As the only safety net is broken, there is no way to recover from such
deadly corruption.

> 
> Furthermore, I'd like to know what would be the best solution for an
> embedded system where power cuts are unavoidable (because of a missing
> circuit). I'm thinking of using a read-only rootfs with a separate
> data partition to ensure at least a booting system. But anyway, the
> data partition could end up in the same state.

Since if it's hardware related, I recommend to do a power loss test
using latest kernel.

If it's the sdcard's problem, under heavy btrfs write load and powerloss
it would be pretty easy to corrupt the fs.

Then you can try other sdcard until find a good one, or prove it's
kernel's fault and we can address it.

Thanks,
Qu

> 
> I'm not sure if it would be also a good option working with snapshots.
> My space on the embedded device is limited to 8GB. The OS already
> takes about 4GB.
> 
> Best regards
> Carsten
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux