- To: NeilBrown <neilb@xxxxxxx>
- Subject: Re: MD Raid10 recovery results in "attempt to access beyond end of device"
- From: Christian Balzer <chibi@xxxxxxx>
- Date: Fri, 22 Jun 2012 17:42:57 +0900
- Cc: linux-raid@xxxxxxxxxxxxxxx
- In-reply-to: <20120622180748.5f78339c@notabene.brown>
- Organization: FusionGOL
Hello,
On Fri, 22 Jun 2012 18:07:48 +1000 NeilBrown wrote:
> On Fri, 22 Jun 2012 16:06:32 +0900 Christian Balzer <chibi@xxxxxxx>
> wrote:
>
> >
> > Hello,
> >
> > the basics first:
> > Debian Squeeze, custom 3.2.18 kernel.
> >
> > The Raid(s) in question are:
> > ---
> > Personalities : [raid1] [raid10]
> > md4 : active raid10 sdd1[0] sdb4[5](S) sdl1[4] sdk1[3] sdj1[2] sdi1[1]
> > 3662836224 blocks super 1.2 512K chunks 2 near-copies [5/5]
> > [UUUUU]
>
> I'm stumped by this. It shouldn't be possible.
>
> The size of the array is impossible.
>
> If there are N chunks per device, then there are 5*N chunks on the whole
> array, and there are are two copies of each data chunk, so
> 5*N/2 distinct data chunks, so that should be the size of the array.
>
> So if we take the size of the array, divide by chunk size, multiply by 2,
> divide by 5, we get N = the number of chunks per device.
> i.e.
> N = (array_size / chunk_size)*2 / 5
>
> If we plug in 3662836224 for the array size and 512 for the chunk size,
> we get 2861590.8, which is not an integer.
> i.e. impossible.
>
Quite right, though I never bothered to check that number of course,
pretty much assuming after using Linux MD since the last millennium that
it would get things right. ^o^
> What does "mdadm --examine" of the various devices show?
>
They looks all identical and sane to me:
---
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2b46b20b:80c18c76:bcd534b5:4d1372e4
Name : borg03b:3 (local to host borg03b)
Creation Time : Sat May 19 01:07:34 2012
Raid Level : raid10
Raid Devices : 5
Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB)
Array Size : 5860538368 (2794.52 GiB 3000.60 GB)
Used Dev Size : 2930269184 (1397.26 GiB 1500.30 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : fe922c1c:35319892:cc1e32e9:948d932c
Update Time : Fri Jun 22 17:12:05 2012
Checksum : 27a61d9a - correct
Events : 90893
Layout : near=2
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAA ('A' == active, '.' == missing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2b46b20b:80c18c76:bcd534b5:4d1372e4
Name : borg03b:3 (local to host borg03b)
Creation Time : Sat May 19 01:07:34 2012
Raid Level : raid10
Raid Devices : 5
Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB)
Array Size : 5860538368 (2794.52 GiB 3000.60 GB)
Used Dev Size : 2930269184 (1397.26 GiB 1500.30 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : e7f5da61:cba8e3f7:d5efbd3d:2f4d3013
Update Time : Fri Jun 22 17:12:55 2012
Checksum : dc88710 - correct
Events : 90923
Layout : near=2
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAA ('A' == active, '.' == missing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2b46b20b:80c18c76:bcd534b5:4d1372e4
Name : borg03b:3 (local to host borg03b)
Creation Time : Sat May 19 01:07:34 2012
Raid Level : raid10
Raid Devices : 5
Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB)
Array Size : 5860538368 (2794.52 GiB 3000.60 GB)
Used Dev Size : 2930269184 (1397.26 GiB 1500.30 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : eea0d414:382d5ac4:851772a2:af72eceb
Update Time : Fri Jun 22 17:13:10 2012
Checksum : caa903cc - correct
Events : 90933
Layout : near=2
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAA ('A' == active, '.' == missing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2b46b20b:80c18c76:bcd534b5:4d1372e4
Name : borg03b:3 (local to host borg03b)
Creation Time : Sat May 19 01:07:34 2012
Raid Level : raid10
Raid Devices : 5
Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB)
Array Size : 5860538368 (2794.52 GiB 3000.60 GB)
Used Dev Size : 2930269184 (1397.26 GiB 1500.30 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : ffcfc875:77d830a0:14575bdc:c339a428
Update Time : Fri Jun 22 17:13:34 2012
Checksum : 7e14e4e9 - correct
Events : 90947
Layout : near=2
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAA ('A' == active, '.' == missing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x2
Array UUID : 2b46b20b:80c18c76:bcd534b5:4d1372e4
Name : borg03b:3 (local to host borg03b)
Creation Time : Sat May 19 01:07:34 2012
Raid Level : raid10
Raid Devices : 5
Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB)
Array Size : 5860538368 (2794.52 GiB 3000.60 GB)
Used Dev Size : 2930269184 (1397.26 GiB 1500.30 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Recovery Offset : 1465135104 sectors
State : clean
Device UUID : e86f53a3:940ce746:25423ae0:da3b179f
Update Time : Fri Jun 22 17:13:49 2012
Checksum : 23fbd830 - correct
Events : 90953
Layout : near=2
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAA ('A' == active, '.' == missing)
---
I verified that these are identical to the ones on the other machine which
survived a resync event flawlessly.
The version of mdadm in Squeeze is: mdadm - v3.1.4 - 31st August 2010
I created a pretty similar setup last year with 5 2TB drives each and
using a 3.0.7 kernel. That array size is nicely divisible...
I have a sinking feeling that the "fix" for this will be a rebuild of the
RAIDs on a production cluster. >.<
Christian
> NeilBrown
>
>
> >
> > md3 : active raid10 sdh1[7] sdc1[0] sda4[5](S) sdg1[3] sdf1[2] sde1[6]
> > 3662836224 blocks super 1.2 512K chunks 2 near-copies [5/4]
> > [UUUU_] [=====>...............] recovery = 28.3%
> > (415962368/1465134592) finish=326.2min speed=53590K/sec ---
> >
> > Drives sda to sdd are on nVidia MCP55 and sde to sdl on SAS1068E, sdc
> > to sdl are identical 1.5TB Seagates (about 2 years old, recycled from
> > the previous incarnation of these machines) with a single partition
> > spanning the whole drive like this:
> > ---
> > Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
> > 255 heads, 63 sectors/track, 182401 cylinders
> > Units = cylinders of 16065 * 512 = 8225280 bytes
> > Sector size (logical/physical): 512 bytes / 512 bytes
> > I/O size (minimum/optimal): 512 bytes / 512 bytes
> > Disk identifier: 0x00000000
> >
> > Device Boot Start End Blocks Id System
> > /dev/sdc1 1 182401 1465136001 fd Linux raid
> > autodetect ---
> >
> > sda and sdb are new 2TB Hitachi drives, partitioned like this:
> > ---
> > Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
> > 255 heads, 63 sectors/track, 243201 cylinders
> > Units = cylinders of 16065 * 512 = 8225280 bytes
> > Sector size (logical/physical): 512 bytes / 512 bytes
> > I/O size (minimum/optimal): 512 bytes / 512 bytes
> > Disk identifier: 0x000d53b0
> >
> > Device Boot Start End Blocks Id System
> > /dev/sda1 * 1 31124 249999360 fd Linux raid
> > autodetect /dev/sda2 31124 46686 124999680 fd
> > Linux raid autodetect /dev/sda3 46686 50576
> > 31246425 fd Linux raid autodetect /dev/sda4 50576
> > 243201 1547265543+ fd Linux raid autodetect ---
> >
> > So the idea is to have 5 drives per each of the two Raid10s and one
> > spare on that (intentionally over-sized) fourth partition of the
> > bigger OS disks.
> >
> > Some weeks ago a drive failed on the twin (identical everything, DRBD
> > replication of those 2 RAIDs) of the machine in question and everything
> > went according to the book, spare took over and things got rebuild, I
> > replaced the failed drive (sdi) later:
> > ---
> > md4 : active raid10 sdi1[6](S) sdd1[0] sdb4[5] sdl1[4] sdk1[3] sdj1[2]
> > 3662836224 blocks super 1.2 512K chunks 2 near-copies [5/5]
> > [UUUUU] ---
> >
> > Two days ago drive sdh on the machine that's having issues failed:
> > ---
> > Jun 20 18:22:39 borg03b kernel: [1383395.448043] sd 8:0:3:0: Device
> > offlined - not ready after error recovery Jun 20 18:22:39 borg03b
> > kernel: [1383395.448135] sd 8:0:3:0: rejecting I/O to offline device
> > Jun 20 18:22:39 borg03b kernel: [1383395.452063] end_request: I/O
> > error, dev sdh, sector 71 Jun 20 18:22:39 borg03b kernel:
> > [1383395.452063] md: super_written gets error=-5, uptodate=0 Jun 20
> > 18:22:39 borg03b kernel: [1383395.452063] md/raid10:md3: Disk failure
> > on sdh1, disabling device. Jun 20 18:22:39 borg03b kernel:
> > [1383395.452063] md/raid10:md3: Operation continuing on 4 devices. Jun
> > 20 18:22:39 borg03b kernel: [1383395.527178] RAID10 conf printout: Jun
> > 20 18:22:39 borg03b kernel: [1383395.527181] --- wd:4 rd:5 Jun 20
> > 18:22:39 borg03b kernel: [1383395.527184] disk 0, wo:0, o:1, dev:sdc1
> > Jun 20 18:22:39 borg03b kernel: [1383395.527186] disk 1, wo:0, o:1,
> > dev:sde1 Jun 20 18:22:39 borg03b kernel: [1383395.527189] disk 2,
> > wo:0, o:1, dev:sdf1 Jun 20 18:22:39 borg03b kernel: [1383395.527191]
> > disk 3, wo:0, o:1, dev:sdg1 Jun 20 18:22:39 borg03b kernel:
> > [1383395.527193] disk 4, wo:1, o:0, dev:sdh1 Jun 20 18:22:39 borg03b
> > kernel: [1383395.568037] RAID10 conf printout: Jun 20 18:22:39 borg03b
> > kernel: [1383395.568040] --- wd:4 rd:5 Jun 20 18:22:39 borg03b
> > kernel: [1383395.568042] disk 0, wo:0, o:1, dev:sdc1 Jun 20 18:22:39
> > borg03b kernel: [1383395.568045] disk 1, wo:0, o:1, dev:sde1 Jun 20
> > 18:22:39 borg03b kernel: [1383395.568047] disk 2, wo:0, o:1, dev:sdf1
> > Jun 20 18:22:39 borg03b kernel: [1383395.568049] disk 3, wo:0, o:1,
> > dev:sdg1 Jun 20 18:22:39 borg03b kernel: [1383395.568060] RAID10 conf
> > printout: Jun 20 18:22:39 borg03b kernel: [1383395.568061] --- wd:4
> > rd:5 Jun 20 18:22:39 borg03b kernel: [1383395.568063] disk 0, wo:0,
> > o:1, dev:sdc1 Jun 20 18:22:39 borg03b kernel: [1383395.568065] disk
> > 1, wo:0, o:1, dev:sde1 Jun 20 18:22:39 borg03b kernel:
> > [1383395.568068] disk 2, wo:0, o:1, dev:sdf1 Jun 20 18:22:39 borg03b
> > kernel: [1383395.568070] disk 3, wo:0, o:1, dev:sdg1 Jun 20 18:22:39
> > borg03b kernel: [1383395.568072] disk 4, wo:1, o:1, dev:sda4 Jun 20
> > 18:22:39 borg03b kernel: [1383395.568135] md: recovery of RAID array
> > md3 Jun 20 18:22:39 borg03b kernel: [1383395.568139] md: minimum
> > _guaranteed_ speed: 20000 KB/sec/disk. Jun 20 18:22:39 borg03b
> > kernel: [1383395.568142] md: using maximum available idle IO bandwidth
> > (but not more than 500000 KB/sec) for recovery. Jun 20 18:22:39
> > borg03b kernel: [1383395.568155] md: using 128k window, over a total
> > of 1465134592k. ---
> >
> > OK, spare kicked, recovery underway (from the neighbors sdg and sdc),
> > but then: ---
> > Jun 21 02:29:29 borg03b kernel: [1412604.989978] attempt to access
> > beyond end of device Jun 21 02:29:29 borg03b kernel: [1412604.989983]
> > sdc1: rw=0, want=2930272128, limit=2930272002 Jun 21 02:29:29 borg03b
> > kernel: [1412604.990003] attempt to access beyond end of device Jun 21
> > 02:29:29 borg03b kernel: [1412604.990009] sdc1: rw=16,
> > want=2930272008, limit=2930272002 Jun 21 02:29:29 borg03b kernel:
> > [1412604.990013] md/raid10:md3: recovery aborted due to read error Jun
> > 21 02:29:29 borg03b kernel: [1412604.990025] attempt to access beyond
> > end of device Jun 21 02:29:29 borg03b kernel: [1412604.990028] sdc1:
> > rw=0, want=2930272256, limit=2930272002 Jun 21 02:29:29 borg03b
> > kernel: [1412604.990032] md: md3: recovery done. Jun 21 02:29:29
> > borg03b kernel: [1412604.990035] attempt to access beyond end of
> > device Jun 21 02:29:29 borg03b kernel: [1412604.990038] sdc1: rw=16,
> > want=2930272136, limit=2930272002 Jun 21 02:29:29 borg03b kernel:
> > [1412604.990040] md/raid10:md3: recovery aborted due to read error ---
> >
> > Why it would want to read data beyond the end of that device (and
> > partition) is a complete mystery to me, if anything was odd with this
> > Raid or its superblocks, surely the initial sync should have stumbled
> > across this as well?
> >
> > After this failure the kernel goes into a log frenzy:
> > ---
> > Jun 21 02:29:29 borg03b kernel: [1412605.744052] RAID10 conf printout:
> > Jun 21 02:29:29 borg03b kernel: [1412605.744055] --- wd:4 rd:5
> > Jun 21 02:29:29 borg03b kernel: [1412605.744057] disk 0, wo:0, o:1,
> > dev:sdc1 Jun 21 02:29:29 borg03b kernel: [1412605.744060] disk 1,
> > wo:0, o:1, dev:sde1 Jun 21 02:29:29 borg03b kernel: [1412605.744062]
> > disk 2, wo:0, o:1, dev:sdf1 Jun 21 02:29:29 borg03b kernel:
> > [1412605.744064] disk 3, wo:0, o:1, dev:sdg1 ---
> > repeating every second or so, until I "mdadm -r"ed the sda4 partition
> > (former spare).
> >
> > On the next day I replaced the failed sdh drive with another 2TB
> > Hitachi (having only 1.5TB Seagates of dubious quality lying around),
> > gave it the same single partition size as the other drives and added
> > it to md3.
> >
> > The resync failed in the same manner:
> > ---
> > Jun 21 20:59:06 borg03b kernel: [1479182.509914] attempt to access
> > beyond end of device Jun 21 20:59:06 borg03b kernel: [1479182.509920]
> > sdc1: rw=0, want=2930272128, limit=2930272002 Jun 21 20:59:06 borg03b
> > kernel: [1479182.509931] attempt to access beyond end of device Jun 21
> > 20:59:06 borg03b kernel: [1479182.509933] attempt to access beyond end
> > of device Jun 21 20:59:06 borg03b kernel: [1479182.509937] sdc1: rw=0,
> > want=2930272256, limit=2930272002 Jun 21 20:59:06 borg03b kernel:
> > [1479182.509942] md: md3: recovery done. Jun 21 20:59:06 borg03b
> > kernel: [1479182.509948] sdc1: rw=16, want=2930272008,
> > limit=2930272002 Jun 21 20:59:06 borg03b kernel: [1479182.509952]
> > md/raid10:md3: recovery aborted due to read error Jun 21 20:59:06
> > borg03b kernel: [1479182.509963] attempt to access beyond end of
> > device Jun 21 20:59:06 borg03b kernel: [1479182.509965] sdc1: rw=16,
> > want=2930272136, limit=2930272002 Jun 21 20:59:06 borg03b kernel:
> > [1479182.509968] md/raid10:md3: recovery aborted due to read error ---
> >
> > I've now scrounged up an identical 1.5TB drive and added it to the Raid
> > (the recovery visible in the topmost mdstat).
> > If that fails as well, I'm completely lost as to what's going on, if it
> > succeeds though I guess we're looking at a subtle bug.
> >
> > I didn't find anything like this mentioned in the archives before, any
> > and all feedback would be most welcome.
> >
> > Regards,
> >
> > Christian
>
--
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Global OnLine Japan/Fusion Communications
http://www.gol.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[ATA RAID]
[Linux SCSI Target Infrastructure]
[Managing RAID on Linux]
[Linux IDE]
[Linux SCSI]
[Linux Hams]
[Device-Mapper]
[Kernel]
[Linux Books]
[Linux Admin]
[Linux Net]
[GFS]
[RPM]
[git]
[Photos]
[Yosemite Photos]
[Yosemite News]
[AMD 64]
[Linux Networking]