Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> Package: linux-2.6
> Version: 2.6.38-3
> Severity: normal
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> As you can see from the kern.log snippet below, I am seeing frequent
> messages reporting "bio too big device md0 (248 > 240)".
> 
> I run what I imagine is a fairly unusual disk setup on my laptop,
> consisting of:
> 
>   ssd -> raid1 -> dm-crypt -> lvm -> ext4
> 
> I use the raid1 as a backup.  The raid1 operates normally in degraded
> mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
> then fail/remove the external hdd. 

Well, this is not expected to work.  Possibly the hot-addition of a disk
with different bio restrictions should be rejected.  But I'm not sure,
because it is safe to do that if there is no mounted filesystem or
stacking device on top of the RAID.

I would recommend using filesystem-level backup (e.g. dirvish or
backuppc).  Aside from this bug, if the SSD fails during a RAID resync
you will be left with an inconsistent and therefore useless 'backup'.

> I started noticing these messages after my last sync.  I have not
> rebooted since.
> 
> I found a bug report on the launchpad that describes an almost
> identical situation:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/320638
> 
> The reporter seemed to be concerned that their may be data loss
> happening.  I have not yet noticed any, but of course I'm terrified
> that it's happening and I just haven't found it yet.  Unfortunately
> the bug was closed with a "Won't Fix" without any resolution.
> 
> Is this a kernel bug, or is there something I can do to remedy the
> situation?  I haven't tried to reboot yet to see if the messages stop.
> I'm obviously most worried about data loss.  Please advise!

The block layer correctly returns an error after logging this message.
If it's due to a read operation, the error should be propagated up to
the application that tried to read.  If it's due to a write operation, I
would expect the error to result in the RAID becoming desynchronised.
In some cases it might be propagated to the application that tried to
write.

If the error is somehow discarded then there *is* a kernel bug with the
risk of data loss.

> I am starting to suspect that these messages are in face associated with
> data loss on my system.  I have witnessed these messages occur during
> write operations to the disk, and I have also started to see some
> strange behavior on my system.  dhclient started acting weird after
> these messages appeared (not holding on to leases) and I started to
> notice database exceptions in my mail client.
>
> Interestingly, the messages seem to have gone away after reboot.  I will
> watch closely to see if they return after my next raid1 sync.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux