Hi,
I'm working on an embedded linux DVR product and its kernel is based on 2.6.24. During recent testing I found several SATA disk IO errors while read/write disks for long time, e.g. about 24 hours.
I find three kinds of Seagate SATA disk have such problem. They are
ST2000DL003 (Barracuda Green / 2TB / 5900rpm / 64M cache / 4KB per sector)
ST500DM002 (Barracuda Green / 500G / 7200rpm / 16M cache / 4KB per sector)
ST1000526SV (SV35 series / 1TB / 7200rpm / 32M cache / 512B per sector).
The kernel output is alike below.
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: device not ready (errno=-16), forcing hardreset
ata4: hard resetting link
ata4: port is slow to respond, please be patient (Status 0xff)
ata4: COMRESET failed (errno=-16)
ata4: hard resetting link
ata4: port is slow to respond, please be patient (Status 0xff)
ata4: COMRESET failed (errno=-16)
ata4: hard resetting link
ata4: port is slow to respond, please be patient (Status 0xff)
ata4: COMRESET failed (errno=-16)
ata4: hard resetting link
ata4: COMRESET failed (errno=-16)
ata4: reset failed, giving up
ata4.00: disabled
ata4: EH complete
I analyzed the kernel output and got its reason is ATA_CMD_FLUSH_EXT command timeout.
I tried adding SCSI flush cache command timeout to 120 seconds and retrying 5 times when the command is timed out, the symptom was still happened.
I tried adding ATA_CMD_FLUSH_EXT timeout to 120 seconds becuase of the specification of ATA8, the symptom was still happened.
There is a very strange symptom that is before the failed ATA_CMD_FLUSH_EXT(cmd ea) command, the last command must be ATA_CMD_VERIFY(cmd 40).