AW: AW: [HELP] Recover a RAID5 with 8 drives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>Hi Michael,


Hi Maurizio

>I agree with you that our situations seem very similar, moreover your 
>analysis seems correct to me, since our hard disks are all WD Caviar 
>Green, so they lack of the TLER feature (which I wasn't aware of, thanks 
>for pointing out this too).


The TLER is partly available, but per default not active as you surely
have found out. Everytime to use smartctl.... just to force using it is
a hardware problem, and we solved it by using real disks this time.

>Luckily I just managed to access to the RAID in order to backup 
>important data, executing `mdadm --assemble --force /dev/md0 
>/dev/sd[abcdefgh]3`; so the crucial part is done; now I have the 
>"freedom" to do everything in order to resolve the issue.

Luckily your disks were not corrupted (badBlacks) as mine. 5 out of 8
disks hat BB from 8k to 47k blocks, one was out of order (so a collegue
of Ontrack helped me to get this one running=sdg). After getting most
of the datas transferred I recognized the md superblock is gone on 5
of 8 HDs and the partition table is unreadable on two, so I had to
implement them into the transferred disks.

>Now I would ask you:

>  * how did you proceed in order to restore your situation? Do you have
>    any suggestion?

I'd not have asked in my first mail for help if I could assemble it
somehow. I'd advice for the doing as mdadm is far away of my knowledge
(which is good on hardware RAID Controllers and ZFS Systems, which I
run on big servers). Currently I need a way which could work. As my
problem was not even replied once it seems I'm on my own.
My way would be (as I have a backup of all disks when they were
delivered to me): 
a) using testdisk to extract the Part table from a running
Disk into the PT missing 5 disks
b) get mdadm to write new superblocks into the disks (the datas are
not tempered/altered that way as I understand mdadm manual) via
--create --asssume-clean and leave the sdb out via the missing
Parameter so like
mdadm --create --assume-clean /dev/md0 --level=5 --raid-devices=8
/dev/sd{a,c,d,e,f,g,h}3 missing
c) run an e2fsck on the md0 (which is now possible as I added
2GB RAM)
d) get the datas into our SAN

e) recreate the array manually and exporting this time into the
mdadm.conf (which the QNAP does not do)


Funny thing is that the QNAP support provided me with wrong ways
to go according to Neil Browns answer and the QNAP 859 is not able
to run an fsck by default as it has not enough ram when volume is over
8TiB



>  * reading about TLER I believe I understood that the failing disks
>are not necessarly broken, but the RAID thinks they are; does it mean
>    that I can still use the failing disks?

One bad Block is for me enough to swap a drive, but you are right:
the disk just took too much time to recover so it was thrown out of
the array. As my disks had big areas with defective blocks in just
2 years of running all disks were replaced. Just the 8 month of
waiting now costed more than the complete array and replacement drives.
It just proves: good disk are worth their money!

You might have read in my thread that this was the second QNAP with
the same problem (this time much more sincere; the first case about
one year back was just like your one)


Cheers
Michael


Il 28/01/2014 21:11, Samer, Michael (I/ET-83, extern) ha scritto:
> Hello Maurizio
> A very likewise case did happened to me (search for QNAP).
> Your box dropped a second one (=full failure) while rebuilding, I guess due to read errors and no TLER capable drive.
> Western Digital is prone for this.
>
> I was lucky to be able to copy all of my faulty (5 of 8) drives and currently I try to recreate the md superblocks which have been lost on the last write.
> What drives do you use?
>
> Cheers
> Sam
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-owner@xxxxxxxxxxxxxxx] Im Auftrag von Maurizio De Santis
> Gesendet: Dienstag, 28. Januar 2014 16:30
> An: linux-raid@xxxxxxxxxxxxxxx
> Betreff: [HELP] Recover a RAID5 with 8 drives
>
> Hi!
>
> I think I've got a problem :-/ I have a QNAP NAS with a 8 disks RAID5.
> Some days ago I got a "Disk Read/Write Error" on the 8th drive
> (/dev/sdh), with the suggestion to replace the disk.
>
> I replaced it, but after a bit the RAID rebuilding failed, and the QNAP
> Admin Interface still gives me a "Disk Read/Write Error" on /dev/sdh.
> Plus, I can't access to the RAID data anymore :-/
>
> I was following this guide
> https://raid.wiki.kernel.org/index.php/RAID_Recovery but, since I
> haven't got any backup (I promise I will do them in the future!) I'm
> afraid to run any possibly destructive command.
>
> How do you suggest to proceed? I would like to make a RAID excluding the
> 8th disk in order to mount it and backup important data, but I don't
> even know if it is doable :-/ Moreover, looking at `mdadm --examine`
> output I see that sdb seems to have problems too, also if QNAP Admin
> Interface doesn't report it.
>
> Here some informations about the machine status:
>
> # uname -a
> Linux NAS 3.4.6 #1 SMP Thu Sep 12 10:56:51 CST 2013 x86_64 unknown
>
> # mdadm -V
> mdadm - v2.6.3 - 20th August 2007
>
> # cat /etc/mdadm.conf
> ARRAY /dev/md0
> devices=/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3,/dev/sde3,/dev/sdf3,/dev/sdg3,/dev/sdh3
>
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath]
> md8 : active raid1 sdg2[2](S) sdf2[3](S) sde2[4](S) sdd2[5](S)
> sdc2[6](S) sdb2[1] sda2[0]
>         530048 blocks [2/2] [UU]
>
> md13 : active raid1 sda4[0] sde4[6] sdf4[5] sdg4[4] sdd4[3] sdc4[2] sdb4[1]
>         458880 blocks [8/7] [UUUUUUU_]
>         bitmap: 8/57 pages [32KB], 4KB chunk
>
> md9 : active raid1 sda1[0] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
>         530048 blocks [8/7] [UUUUUUU_]
>         bitmap: 30/65 pages [120KB], 4KB chunk
>
> unused devices: <none>
>
> # mdadm --examine /dev/sd[abcdefgh]3
> /dev/sda3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047ab - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     0       8        3        0      active sync   /dev/sda3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdb3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 8
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:09:57 2014
>             State : active
>    Active Devices : 7
> Working Devices : 8
>    Failed Devices : 1
>     Spare Devices : 1
>          Checksum : 97f3567d - correct
>            Events : 0.2944837
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     1       8       19        1      active sync   /dev/sdb3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       8       19        1      active sync   /dev/sdb3
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
>      8     8       8      115        8      spare   /dev/sdh3
> /dev/sdc3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047cf - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     2       8       35        2      active sync   /dev/sdc3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdd3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047e1 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     3       8       51        3      active sync   /dev/sdd3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sde3:
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 982047f3 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     4       8       67        4      active sync   /dev/sde3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdf3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 98204805 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     5       8       83        5      active sync   /dev/sdf3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdg3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 7
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:19:58 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 6
>    Failed Devices : 2
>     Spare Devices : 0
>          Checksum : 98204817 - correct
>            Events : 0.2944851
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     6       8       99        6      active sync   /dev/sdg3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
> /dev/sdh3:
>             Magic : a92b4efc
>           Version : 00.90.00
>              UUID : 418e2add:2c4b313b:d12fb7ea:993d5bf7
>     Creation Time : Fri Jan 20 02:19:47 2012
>        Raid Level : raid5
>     Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
>        Array Size : 13663619200 (13030.64 GiB 13991.55 GB)
>      Raid Devices : 8
>     Total Devices : 8
> Preferred Minor : 0
>
>       Update Time : Fri Jan 24 17:18:26 2014
>             State : clean
>    Active Devices : 6
> Working Devices : 7
>    Failed Devices : 2
>     Spare Devices : 1
>          Checksum : 98204851 - correct
>            Events : 0.2944847
>
>            Layout : left-symmetric
>        Chunk Size : 64K
>
>         Number   Major   Minor   RaidDevice State
> this     8       8      115        8      spare   /dev/sdh3
>
>      0     0       8        3        0      active sync   /dev/sda3
>      1     1       0        0        1      faulty removed
>      2     2       8       35        2      active sync   /dev/sdc3
>      3     3       8       51        3      active sync   /dev/sdd3
>      4     4       8       67        4      active sync   /dev/sde3
>      5     5       8       83        5      active sync   /dev/sdf3
>      6     6       8       99        6      active sync   /dev/sdg3
>      7     7       0        0        7      faulty removed
>      8     8       8      115        8      spare   /dev/sdh3
>
> # dmesg **edited (removed unuseful parts)**
> , wo:0, o:1, dev:sdb2
> [  975.516724] RAID1 conf printout:
> [  975.516728]  --- wd:2 rd:2
> [  975.516732]  disk 0, wo:0, o:1, dev:sda2
> [  975.516737]  disk 1, wo:0, o:1, dev:sdb2
> [  975.516740] RAID1 conf printout:
> [  975.516744]  --- wd:2 rd:2
> [  975.516748]  disk 0, wo:0, o:1, dev:sda2
> [  975.516753]  disk 1, wo:0, o:1, dev:sdb2
> [  977.495709] md: unbind<sdh2>
> [  977.505048] md: export_rdev(sdh2)
> [  977.535277] md/raid1:md9: Disk failure on sdh1, disabling device.
> [  977.575038]  disk 2, wo:0, o:1, dev:sdc1
> [  977.575043]  disk 3, wo:0, o:1, dev:sdd1
> [  977.575048]  disk 4, wo:0, o:1, dev:sde1
> [  977.575053]  disk 5, wo:0, o:1, dev:sdf1
> [  977.575058]  disk 6, wo:0, o:1, dev:sdg1
> [  979.547149] md: unbind<sdh1>
> [  979.558031] md: export_rdev(sdh1)
> [  979.592646] md/raid1:md13: Disk failure on sdh4, disabling device.
> [  979.592650] md/raid1:md13: Operation continuing on 7 devices.
> [  979.650862] RAID1 conf printout:
> [  979.650869]  --- wd:7 rd:8
> [  979.650875]  disk 0, wo:0, o:1, dev:sda4
> [  979.650880]  disk 1, wo:0, o:1, dev:sdb4
> [  979.650885]  disk 2, wo:0, o:1, dev:sdc4
> [  979.650890]  disk 3, wo:0, o:1, dev:sdd4
> [  979.650895]  disk 4, wo:0, o:1, dev:sdg4
> [  979.650900]  disk 5, wo:0, o:1, dev:sdf4
> [  979.650905]  disk 6, wo:0, o:1, dev:sde4
> [  979.650911]  disk 7, wo:1, o:0, dev:sdh4
> [  979.656024] RAID1 conf printout:
> [  979.656029]  --- wd:7 rd:8
> [  979.656034]  disk 0, wo:0, o:1, dev:sda4
> [  979.656039]  disk 1, wo:0, o:1, dev:sdb4
> [  979.656044]  disk 2, wo:0, o:1, dev:sdc4
> [  979.656049]  disk 3, wo:0, o:1, dev:sdd4
> [  979.656054]  disk 4, wo:0, o:1, dev:sdg4
> [  979.656059]  disk 5, wo:0, o:1, dev:sdf4
> [  979.656063]  disk 6, wo:0, o:1, dev:sde4
> [  981.604906] md: unbind<sdh4>
> [  981.616035] md: export_rdev(sdh4)
> [  981.753058] md/raid:md0: Disk failure on sdh3, disabling device.
> [  981.753062] md/raid:md0: Operation continuing on 6 devices.
> [  983.765852] md: unbind<sdh3>
> [  983.777030] md: export_rdev(sdh3)
> [ 1060.094825] journal commit I/O error
> [ 1060.099196] journal commit I/O error
> [ 1060.103525] journal commit I/O error
> [ 1060.108698] journal commit I/O error
> [ 1060.116311] journal commit I/O error
> [ 1060.123634] journal commit I/O error
> [ 1060.127225] journal commit I/O error
> [ 1060.130930] journal commit I/O error
> [ 1060.137651] EXT4-fs (md0): previous I/O error to superblock detected
> [ 1060.178323] Buffer I/O error on device md0, logical block 0
> [ 1060.181873] lost page write due to I/O error on md0
> [ 1060.185634] EXT4-fs error (device md0): ext4_put_super:849: Couldn't
> clean up the journal
> [ 1062.662723] md0: detected capacity change from 13991546060800 to 0
> [ 1062.666308] md: md0 stopped.
> [ 1062.669760] md: unbind<sda3>
> [ 1062.681031] md: export_rdev(sda3)
> [ 1062.684466] md: unbind<sdg3>
> [ 1062.695023] md: export_rdev(sdg3)
> [ 1062.698342] md: unbind<sdf3>
> [ 1062.709021] md: export_rdev(sdf3)
> [ 1062.712310] md: unbind<sde3>
> [ 1062.723029] md: export_rdev(sde3)
> [ 1062.726245] md: unbind<sdd3>
> [ 1062.737022] md: export_rdev(sdd3)
> [ 1062.740112] md: unbind<sdc3>
> [ 1062.751022] md: export_rdev(sdc3)
> [ 1062.753934] md: unbind<sdb3>
> [ 1062.764021] md: export_rdev(sdb3)
> [ 1063.772687] md: md0 stopped.
> [ 1064.782381] md: md0 stopped.
> [ 1065.792585] md: md0 stopped.
> [ 1066.801668] md: md0 stopped.
> [ 1067.812573] md: md0 stopped.
> [ 1068.821548] md: md0 stopped.
> [ 1069.830667] md: md0 stopped.
> [ 1070.839554] md: md0 stopped.
> [ 1071.848418] md: md0 stopped.
>


-- 

Maurizio De Santis
DEVELOPMENT MANAGER
Morgan S.p.A.
Via Degli Olmetti, 36
00060 Formello (RM), Italy
t. 06.9075275
w. www.morganspa.com
m. m.desantis@xxxxxxxxxxxxx

In ottemperanza al Dlgs. 196/2003 sulla tutela dei dati personali, le informazioni contenute in questo messaggio sono strettamente riservate e sono esclusivamente indirizzate al destinatario; qualsiasi uso, o divulgazione dello stesso è vietata. Nel caso in cui abbiate ricevuto questo messaggio per errore. Vi invitiamo ad avvertire il mittente al più presto e a procedere all'immediata distruzione dello stesso.

According to Italian law Dlgs. 196/2003 concerning privacy, information contained in this message is confidential and intended for the addressee only; any use, copy or distribution of same is strictly prohibited. If you have received this message in error, you are requested to inform the sender as soon as possible and immediately destroy it.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux