[UNRESOLVED] Re: errors found in extent allocation tree or chunk allocation after power failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019-09-25T14:32:31, Pallissard, Matthew wrote:
> On 2019-09-25T15:05:44, Chris Murphy wrote:
> > On Wed, Sep 25, 2019 at 1:34 PM Pallissard, Matthew <matt@xxxxxxxxxxxxxx> wrote:
> > > On 2019-09-25T13:08:34, Chris Murphy wrote:
> > > > On Wed, Sep 25, 2019 at 8:50 AM Pallissard, Matthew <matt@xxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > Version:
> > > > > Kernel: 5.2.2-arch1-1-ARCH #1 SMP PREEMPT Sun Jul 21 19:18:34 UTC 2019 x86_64 GNU/Linux
> > > >
> > > > You need to upgrade to arch kernel 5.2.14 or newer (they backported the fix first appearing in stable 5.2.15). Or you need to downgrade to 5.1 series.
> > > > https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@xxxxxxxxxx/T/#u
> > > >
> > > > That's a nasty bug. I don't offhand see evidence that you've hit this bug. But I'm not certain. So first thing should be to use a different kernel.
> > >
> > > Interesting, I'll go ahead with a kernel upgrade as that easy enough.
> > > However, that looks like it's related to a stacktrace regarding a hung process.  Which is not the original problem I had.
> > > Based on the output in my previous email, I've been working under the assumption that there is a problem on-disk.  Is that not correct?
> >
> > That bug does cause filesystem corruption that is not repairable.
> > Whether you have that problem or a different problem, I'm not sure.
> > But it's best to avoid combining problems.
> >
> > The file system mounts rw now? Or still only mounts ro?
> 
> It mounts RW, but I have yet to attempt an actual write.
> 
> 
> > I think most of the errors reported by btrfs check, if they still exist after doing a scrub, should be repaired by 'btrfs check --repair' but I don't advise that until later. I'm not a developer, maybe Qu can offer some advise on those errors.
> 
> 
> > > > Next, anytime there is a crash or powerfailur with Btrfs raid56, you need to do a complete scrub of the volume. Obviously will take time but that's what needs to be done first.
> > >
> > > I'm using raid 10, not 5 or 6.
> >
> > Same advice, but it's not as important to raid10 because it doesn't have the write hole problem.
> 
> 
> > > > OK actually, before the scrub you need to confirm that each drive's SCT ERC time is *less* than the kernel's SCSI command timer. e.g.
> > >
> > > I gather that I should probably do this before any scrub, be it raid 5, 6, or 10.  But, Is a scrub the operation I should attempt on this raid 10 array to repair the specific errors mentioned in my previous email?
> >
> > Definitely deal with the timing issue first. If by chance there are bad sectors on any of the drives, they must be properly reported by the drive with a discrete read error in order for Btrfs to do a proper fixup. If the times are mismatched, then Linux can get tired waiting, and do a link reset on the drive before the read error happens. And now the whole command queue is lost and the problem isn't fixed.
> 
> Good to know, that seems like a critical piece of information.  A few searches turned up this page, https://wiki.debian.org/Btrfs#FAQ.
> 
> Should this be noted on the 'gotchas' or 'getting started page as well?  I'd be happy to make edits should the powers that be allow it.
> 
> 
> > There are myriad errors and the advice I'm giving to scrub is a safe first step to make sure the storage stack is sane - or at least we know where the simpler problems are. And then move to the less simple ones that have higher risk.  It also changed the volume the least. Everything else, like balance and chunk recover and btrfs check --repair - all make substantial changes to the file system and have higher risk of making things worse.
> 
> This sounds sensible.
> 
> 
> > In theory if the storage stack does exactly what Btrfs says, then at worst you should lose some data, but the file system itself should be consistent. And that includes power failures. The fact there's problems reported suggests a bug somewhere - it could be Btrfs, it could be device mapper, it could be controller or drive firmware.
> 
> I'll go ahead with a kernel upgrade/make sure the timing issues are squared away.  Then I'll kick off a scrub.
> 
> I'll report back when the scrub is complete or something interesting happens.  Whichever comes first.

As a followup;
1. I took care of the timing issues
2. ran a scrub.
3. I ran a balance, it kept failing with about 20% left
  - stacktraces in dmesg showed spinlock stuff

3. got I/O errors on one file during my final backup, (
  - post-backup hashsums of everything else checked out
  - the errors during the copy were csum mismatches should anyone care

4. ran a bunch of potentially disruptive btrfs check commands in alphabetical order because "why not at this point?"
  - they had zero affect as far as I can tell, all the same files were readable, the btrfs check errors looked identical (admittedly I didn't put them side by side)

5. re-provisioned the array, restored from backups.

As I thought about it, it may have not been an issue with the original power outage.  I only ran a check after the power outage.  My array could have had an issue due to a previous bug. I was on a 5.2x kernel for several weeks under high load.  Anyway, there are enough unknowns to make a root cause analysis not worth my time.

Marking this as unresolved folks in the future who may be looking for answers.

Matt Pallissard

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux