floating point exception (core dumped) - btrfs rescue chunk-recover

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

TL;DR ==

btrfs 3x500GB RAID 5 - One device failed. Added a new device (btrfs device add) and tried to remove the failed device (btrfs device delete).

I tried to mount the array in degraded mode, but that didn't work either. After multiple attempts (including adding back the failed HDD), I finally ran the btrfs rescue chunk-recover command on the primary member /dev/sdb.

This ran for about 4 hours, and then failed with "floating point exception (core dumped)".
==

I am testing out btrfs to gain familiarity with it. I am quite amazed at it's capabilities and performance. However, I am either not able to understand or implement RAID5 fault tolerance.

I understand from the wiki that RAID56 is experimental. The data I am working with is backed up elsewhere and for all intents and purposes, discard-able.

I have set up a btrfs RAID5 with 3x500GB Seagate HDDs, with a mount point of /storage. Booting is off a fourth HDD (ext4, lubuntu 64bit) that is not involved in the RAID.

Everything was working amazingly well, until one HDD failed and was quietly offlined. For a couple of days, the RAID was running off 2 HDDs and I didn't notice.

When I DID realize, I shut down the system, bought a new HDD (2TB), which took a couple of days to arrive.

When I powered up the system again, the failed 500GB was back. Everything loaded fine, and looked good. To be on the safe side, I ran a badblocks test (ro) on the failing HDD.

Halfway through the test, the HDD disappeared again. After a cold reboot, it was loaded fine again.

At this point, I decided to replace the failed HDD. I shutdown, plugged in the new HDD in place of the boot HDD, booted up with Lubuntu live, mounted (/storage) and added the device to the RAID.

After adding the device successfully, I gave a device delete command for the failed HDD. Partway through the process, the failing HDD (/dev/sdc) disappeared again, and after waiting a couple of hours, I hard-reset the system, and removed the failing HDD, assuming that the RAID will re-build on the existing devices.

Now, the RAID (/storage) refused to mount. I got a c_tree error (please see enclosed logs below).

I tried to mount the array in degraded mode, but that didn't work either. After multiple attempts (including adding back the failed HDD), I finally ran the btrfs rescue chunk-recover command on the primary member /dev/sdb.

This ran for about 4 hours, and then failed with "floating point exception (core dumped)".

Can I recover the array or should I start again? The data is not important, but I would like to know the recovery process, or any misconceptions in my thinking that RAID5 with 3 devices is enough for SOHO-level fault tolerance?

Any advice, pointers, etc, much appreciated. Tech level: medium-high (RHCE).

Relevant system information:
=== uname -a
Linux lubuntu 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 15:35:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux


== btrfs --version
btrfs-progs v4.0

== btrfs fi show
warning, device 2 is missing
Label: 'storage'  uuid: 5a3d6590-df08-4520-b61b-802d350849c7
	Total devices 4 FS bytes used 176.91GiB
	devid    1 size 465.76GiB used 90.03GiB path /dev/sdb
	devid    3 size 465.76GiB used 90.01GiB path /dev/sdc
	devid    4 size 1.82TiB used 10.00GiB path /dev/sda
	*** Some devices missing

== dmesg info
...
Jan  5 01:45:22 lubuntu kernel: [   10.338295] Btrfs loaded
Jan  5 01:45:22 lubuntu kernel: [   10.338899] BTRFS: device label storage devid 4 transid 969 /dev/sda
Jan  5 01:45:22 lubuntu kernel: [   10.340448] BTRFS info (device sda): disk space caching is enabled
Jan  5 01:45:22 lubuntu kernel: [   10.340454] BTRFS: has skinny extents
Jan  5 01:45:22 lubuntu kernel: [   10.343395] BTRFS: failed to read the system array on sda
Jan  5 01:45:22 lubuntu kernel: [   10.352137] BTRFS: open_ctree failed
Jan  5 01:45:22 lubuntu kernel: [   10.382199] BTRFS: device label storage devid 1 transid 969 /dev/sdb
Jan  5 01:45:22 lubuntu kernel: [   10.383740] BTRFS info (device sdb): disk space caching is enabled
Jan  5 01:45:22 lubuntu kernel: [   10.383744] BTRFS: has skinny extents
Jan  5 01:45:22 lubuntu kernel: [   10.384469] BTRFS: failed to read the system array on sdb
Jan  5 01:45:22 lubuntu kernel: [   10.392116] BTRFS: open_ctree failed
Jan  5 01:45:22 lubuntu kernel: [   10.423075] BTRFS: device label storage devid 3 transid

... // after btrfs rescue chunk for about 4 hours
Jan  5 06:01:45 lubuntu kernel: [15404.828156] traps: btrfs[3016] trap divide error ip:4211a0 sp:7ffd7dbb03a8 error:0 in btrfs[400000+73000]
...

== some output from btrfs rescu chunk -vv
...
	    Stripes list:
	    [ 0] Stripe: devid = 3, offset = 21484273664
	    [ 1] Stripe: devid = 2, offset = 21484273664
	    [ 2] Stripe: devid = 1, offset = 21504196608
	Chunk: start = 45134905344, len = 2147483648, type = 81, num_stripes = 3
	    Stripes list:
	    [ 0] Stripe: devid = 3, offset = 22558015488
	    [ 1] Stripe: devid = 2, offset = 22558015488
	    [ 2] Stripe: devid = 1, offset = 22577938432
	Chunk: start = 47282388992, len = 2147483648, type = 81, num_stripes = 3
	    Stripes list:
	    [ 0] Stripe: devid = 3, offset = 23631757312
	    [ 1] Stripe: devid = 2, offset = 23631757312
	    [ 2] Stripe: devid = 1, offset = 23651680256
...
	Device extent: devid = 4, start = 5369757696, len = 1073741824, chunk offset = 201901211648
	Device extent: devid = 4, start = 6443499520, len = 1073741824, chunk offset = 204048695296
	Device extent: devid = 4, start = 7517241344, len = 1073741824, chunk offset = 206196178944
	Device extent: devid = 4, start = 8590983168, len = 1073741824, chunk offset = 208343662592
// floating point error

Regards,
PRShah

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux