On Wed, Nov 13, 2013 at 03:03:37PM +0000, Duncan wrote: > Franziska Näpelt posted on Tue, 12 Nov 2013 08:49:12 +0100 as excerpted: > > > we are using a btrfs RAID 1 with four 2TB hard drives (WD Caviar green) > > on a Debian 7.2 with Kernel 3.11.6 > > > > Now we had an 'invalid opcode: 0000 [#1] SMP' when a sector fails in > > messages log. > > After that, access over smb and nfs wasn't possible. > > A restart solved the problem of inaccesibility. > > > A couple notes from a fellow btrfs-using sysadmin... > > 1) invalid opcode 0000: > > As I understand it, this is relatively generic and doesn't define the > error by itself. The 0000 opcode can be viewed as a zero-dereference of > sorts, it's indication of a bug happening earlier, such that an expected > valid opcode ends up being zero. The error itself will be earlier -- > this is just where it ends up being trapped. I believe it's actually (one of?) the methods used to generate BUG or BUG_ON. > As to what that error is in this case... > > 2) btrfs raid1: > > Unlike, for example, md/raid1, btrfs raid1 is not at this point run-time > tolerant of device failure. At this point, a btrfs raid1 device failure > seems to make the entire system basically unusable and require a reboot, > after which device/data recovery (for example, mount degraded, add a > replacement device, rebalance, and delete the failed one, or if it was a > temporary dropout, simply btrfs scrub to find and fix the checksum > mismatches from the valid copy) can be initiated, if necessary. My experience is that the reboot isn't required, although remount rw may be, followed possibly by reinserting the device and definitely running scrub. YMMV, though. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- He's playing Schubert. I think Schubert is losing. ---
Attachment:
signature.asc
Description: Digital signature
