Re: How to handle a RAID5 arrawy with a failing drive? -> raid5 mostly works, just no rebuilds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 19, 2014 at 10:53:33AM -0600, Chris Murphy wrote:
> 
> On Mar 19, 2014, at 9:40 AM, Marc MERLIN <marc@xxxxxxxxxxx> wrote:
> > 
> > After adding a drive, I couldn't quite tell if it was striping over 11
> > drive2 or 10, but it felt that at least at times, it was striping over 11
> > drives with write failures on the missing drive.
> > I can't prove it, but I'm thinking the new data I was writing was being
> > striped in degraded mode.
> 
> Well it does sound fragile after all to add a drive to a degraded array, especially when it's not expressly treating the faulty drive as faulty. I think iotop will show what block devices are being written to. And in a VM it's easy (albeit rudimentary) with sparse files, as you can see them grow.
> 
> > 
> > Yes, although it's limited, you apparently only lose new data that was added
> > after you went into degraded mode and only if you add another drive where
> > you write more data.
> > In real life this shouldn't be too common, even if it is indeed a bug.
> 
> It's entirely plausible a drive power/data cable becomes lose, runs for hours degraded before the wayward device is reseated. It'll be common enough. It's definitely not OK for all of that data in the interim to vanish just because the volume has resumed from degraded to normal. Two states of data, normal vs degraded, is scary. It sounds like totally silent data loss. So yeah if it's reproducible it's worthy of a separate bug.

I just got around to filing that bug:
https://bugzilla.kernel.org/show_bug.cgi?id=72811

In other news, I was able to
1) remove a drive
2) mount degraded
3) add a new drive
4) rebalance (that took 2 days with little data, 4 deadlocks and reboots
though)
5) remove the missing drive from the filesystem
6) remount the array without -o degraded

Now, I'm testing
1) add a new drive
2 remove a working drive
3) automatic rebalance from #2 should rebuild on the new drive automatically

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux