On 26/06/16 02:25, Chris Murphy wrote:
> On Fri, Jun 24, 2016 at 10:19 PM, Steven Haigh <netwiz@xxxxxxxxx> wrote:
>
>>
>> Interesting though that EVERY crash references:
>> kernel BUG at fs/btrfs/extent_io.c:2401!
>
> Yeah because you're mounted ro, and if this is 4.4.13 unmodified btrfs
> from kernel.org then that's the 3rd line:
>
> if (head->is_data) {
> ret = btrfs_del_csums(trans, root,
> node->bytenr,
> node->num_bytes);
>
> So why/what is it cleaning up if it's mounted ro? Anyway, once you're
> no longer making forward progress you could try something newer,
> although it's a coin toss what to try. There are some issues with
> 4.6.0-4.6.2 but there have been a lot of changes in btrfs/extent_io.c
> and btrfs/raid56.c between 4.4.13 that you're using and 4.6.2, so you
> could try that or even build 4.7.rc4 or rc5 by tomorrowish and see how
> that fairs. It sounds like there's just too much (mostly metadata)
> corruption for the degraded state to deal with so it may not matter.
> I'm really skeptical of btrfsck on degraded fs's so I don't think
> that'll help.
Well, I did end up recovering the data that I cared about. I'm not
really keen to ride the BTRFS RAID6 train again any time soon :\
I now have the same as I've had for years - md RAID6 with XFS on top of
it. I'm still copying data back to the array from the various sources I
had to copy it to so I had enough space to do so.
What I find interesting is that the patterns of corruption in the BTRFS
RAID6 is quite clustered. I have ~80Gb of MP3s ripped over the years -
of that, the corruption would take out 3-4 songs in a row, then the next
10 albums or so were intact. What made recovery VERY hard, is that it
got to several situations that just caused a complete system hang.
I tried it on bare metal - just in case it was a Xen thing, but it hard
hung the entire machine then. In every case, it was a flurry of csum
error messages, then instant death. I would have been much happier if
the file had been skipped or returned as unavailable instead of having
the entire machine crash.
I ended up putting the bit of script that I posted earlier in
/etc/rc.local - then just kept doing:
xl destroy myvm && xl create /etc/xen/myvm -c
Wait for the crash, run the above again.
All in all, it took me about 350 boots with an average uptime of about 3
minutes to get the data out that I decided to keep. While not a BTRFS
loss, I did decide with how long it was going to take to not bother
recovering ~3.5Tb of other data that is easily available in other places
on the internet. If I really need the Fedora 24 KDE Spin ISO, or the
CentOS 6 Install DVD, etc etc I can download it again.
--
Steven Haigh
Email: netwiz@xxxxxxxxx
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Attachment:
signature.asc
Description: OpenPGP digital signature
