Re: kernel BUG at fs/btrfs/extent_io.c:1884 - BTRFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 15 Aug 2012 10:21:48 -0500, Anthony Plack wrote:
> Okay, this is the second occurrence of this bug.  I have searched Google, and while there are two posting for exten_io, I am not sure if they match.
> 
> Running Gentoo with kernel 3.5 on a dual core AMD.  The machine has 19 drives of varied types.  I am running rsync from an xfs volume (on two md arrays) to the btrfs volume and moving 8.2T.  This is the first time some of these drives have been exercised.  Four of the drives are in an external cage with a SATA multiplexer running across an eSATA cable.
> 
> On the btrfs volume, the metadata is RAID1, but the data is RAID0.
> 
> To me the most troubling issue is that the bug causes the system to become unresponsive whenever accessing the btrfs volume.  Any btrfs command will hang at the prompt.  umount would similarly hang.   On Aug 10th, I let the prompts sit for 48 hours with no progress because I did not desire to take the box down for other processes.  All attempts to kill the processes come back with no impact on the process, they are just zombies in the system.  The system does not seem to have excessive CPU or memory consumption.
> 
> After the first event, I have learned what is forcing the situation.  There are two used "ST3000DM001-9YN1 CC9D" Seagate drives which are posting some errors in the console.  The multiplexer is responding to these errors by shutting down the drive.  If I reboot the box, the multiplexer will show one drive as off-line.  I was successful in removing and reseating the drive.  The bad block count is up, but not that high for a 3T drive (200s).  "shutdown" command would also hang on this first event.  I unmounted all the other volumes, and had to hard reboot the server.
> 
> The second event, suspecting the multiplexer did it again, I hot unplugged the second drive (/dev/sdi) which was missing from lsscsi.  The drive is back online (as /dev/sdt) but btrfs is not detecting the shift and is still hung.  I have the original rsync stuck.  This time, I was able to get btrfs command to operate without hanging.  In addition, the drive is accessible, but the rsync commands are hung.
> 
> When I attempted to scrub the volume, I posted another trace in the log.
> 
> 
> 
> Okay details....
> 
> 
> Trace Failure 1:
> Aug 10 06:22:48 fatdrive kernel: [131136.506053] kernel BUG at fs/btrfs/extent_io.c:1884!
> Aug 10 06:22:48 fatdrive kernel: [131136.506070] invalid opcode: 0000 [#1] SMP 
> Aug 10 06:22:48 fatdrive kernel: [131136.506087] CPU 1 
> Aug 10 06:22:48 fatdrive kernel: [131136.506090] Modules linked in: btrfs lzo_compress lzo_decompress zlib_deflate crc32c libcrc32c r8168(O) nfsd xfs exportfs shpchp pci_hotplug r8169 k10temp mii kvm_amd kvm
> Aug 10 06:22:48 fatdrive kernel: [131136.506168] 
> Aug 10 06:22:48 fatdrive kernel: [131136.506184] Pid: 8458, comm: btrfs-endio-wri Tainted: G        W  O 3.5.0-gentoo #2 BIOSTAR Group TA880G HD/TA880G HD
> Aug 10 06:22:48 fatdrive kernel: [131136.506219] RIP: 0010:[<ffffffffa02b9231>]  [<ffffffffa02b9231>] repair_io_failure+0x1a1/0x1e0 [btrfs]
> Aug 10 06:22:48 fatdrive kernel: [131136.506270] RSP: 0018:ffff8800889dd970  EFLAGS: 00010246
> Aug 10 06:22:48 fatdrive kernel: [131136.506288] RAX: ffff8800889dd9a0 RBX: 0000000000000000 RCX: 0000007879ea8000
> Aug 10 06:22:48 fatdrive kernel: [131136.506318] RDX: 0000000000001000 RSI: 0000007879ea8000 RDI: ffff880215754108
> Aug 10 06:22:48 fatdrive kernel: [131136.506347] RBP: ffff8800889dd9f0 R08: ffffea0000ef8a80 R09: 0000000000000000
> Aug 10 06:22:48 fatdrive kernel: [131136.506378] R10: 57ffe641d6ef8a80 R11: 0000000000000001 R12: 0000000000000000
> Aug 10 06:22:48 fatdrive kernel: [131136.506407] R13: ffff8800889dd990 R14: 0000007879ea8000 R15: 0000000000001000
> Aug 10 06:22:48 fatdrive kernel: [131136.506439] FS:  00007f7f6959e700(0000) GS:ffff88021fc40000(0000) knlGS:0000000000000000
> Aug 10 06:22:48 fatdrive kernel: [131136.506469] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Aug 10 06:22:48 fatdrive kernel: [131136.506490] CR2: 00007f12b121f624 CR3: 0000000198bac000 CR4: 00000000000007e0
> Aug 10 06:22:48 fatdrive kernel: [131136.506521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Aug 10 06:22:48 fatdrive kernel: [131136.506556] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Aug 10 06:22:48 fatdrive kernel: [131136.506584] Process btrfs-endio-wri (pid: 8458, threadinfo ffff8800889dc000, task ffff88018a3c0770)
> Aug 10 06:22:48 fatdrive kernel: [131136.506614] Stack:
> Aug 10 06:22:48 fatdrive kernel: [131136.506628]  ffffea0000ef8a80 0000007879ea8000 ffff880215754108 ffffea0000ef8a80
> Aug 10 06:22:48 fatdrive kernel: [131136.506659]  0000000000000000 0000000000000000 ffff8800889dd9a0 ffff8800889dd9a0
> Aug 10 06:22:48 fatdrive kernel: [131136.506690]  0000000000000000 0000000000000000 ffff880200000001 0000000000000000
> Aug 10 06:22:48 fatdrive kernel: [131136.506721] Call Trace:
> Aug 10 06:22:48 fatdrive kernel: [131136.506746]  [<ffffffffa02b9b91>] repair_eb_io_failure+0x81/0xa0 [btrfs]
> Aug 10 06:22:48 fatdrive kernel: [131136.506770]  [<ffffffffa029119a>] btree_read_extent_buffer_pages.constprop.115+0x11a/0x120 [btrfs]
[...]

This issue is already fixed with commit c0901581 which is part of Linux 3.6 RC1:

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/18594
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux