Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think I know what is going on. One of your disks at least is slow to
spinup. Due to a bug/feature in silicon image disk controller and pmp,
at bring up we can not issue a SOFT_RESET and wait for the disk to
spinup and then continue.
That why we set ATA_LFLAG_NO_SRST in sata_pmp_quirks().
So what happen is we go into a function that issue identify, but we
fail, the disk is not ready [it is spinning up], so we retry.
3 times.

>From the first hard reset: 12888.470385, to the time you got the final
error: 12901.397305 ~ 12.9s
In the second case, your controller can send SOFT_RESET and wait for
the device to respond.
Time for the disk to spinup:
28010.630028 - 27997.097116 ~ 13.5s
As you can see, you are borderline with the PMP, but the controller
did not "wait" enough in the first case.
Given the spinup time varies with drive, age, time since last
spin-up..., it may work one day and fail the next.
To work around the problem, I have a patch that consist of allowing
the silicon image control to send a reset, but if it fails, we spin
for a fixed amount of time and retry. This is not very nice, it is a
better design to wait for event that waiting a fixed amount of time.
You may have to alter ATA_LFLAG_WAIT_SRST to use the first bit available.

Can you try with the following patch?

Thanks,
Gwendal.

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 228740f..b98b02d 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2798,7 +2798,14 @@ int ata_eh_reset(struct ata_link *link, int classify,
     sata_scr_read(link, SCR_STATUS, &sstatus))
  rc = -ERESTART;

- if (rc == -ERESTART || try >= max_tries)
+ if (try >= max_tries)
+ goto out;
+
+ /* Some PMP will not serve SRST until the disk is spunup,
+ * if the controller can not wait for the PMP to acknowledge the frame,
+ * wait here */
+ if (rc == -ERESTART &&
+    !((lflags & ATA_LFLAG_WAIT_SRST) && (reset == softreset)))
  goto out;

  now = jiffies;
@@ -2813,6 +2820,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
  delta = schedule_timeout_uninterruptible(delta);
  }

+ if (rc == -ERESTART)
+ goto out;
  if (try == max_tries - 1) {
  sata_down_spd_limit(link, 0);
  if (slave)
diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
index 00305f4..d21ad7d 100644
--- a/drivers/ata/libata-pmp.c
+++ b/drivers/ata/libata-pmp.c
@@ -325,13 +351,11 @@ static void sata_pmp_quirks(struct ata_port *ap)
  if (vendor == 0x1095 && devid == 0x3726) {
  /* sil3726 quirks */
  ata_for_each_link(link, ap, EDGE) {
- /* Class code report is unreliable and SRST
- * times out under certain configurations.
- */
+ /* Class code report is unreliable */
+ /* PMP does not forward SRST until the drive spins up */
  if (link->pmp < 5)
- link->flags |= ATA_LFLAG_NO_SRST |
-       ATA_LFLAG_ASSUME_ATA;
-
+ link->flags |= ATA_LFLAG_ASSUME_ATA |
+       ATA_LFLAG_WAIT_SRST;
  /* port 5 is for SEMB device and it doesn't like SRST */
  if (link->pmp == 5)
  link->flags |= ATA_LFLAG_NO_SRST |
diff --git a/include/linux/libata.h b/include/linux/libata.h
index b2f2003..3a18caa 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -172,6 +172,7 @@ enum {
  ATA_LFLAG_NO_RETRY = (1 << 5), /* don't retry this link */
  ATA_LFLAG_DISABLED = (1 << 6), /* link is disabled */
  ATA_LFLAG_SW_ACTIVITY = (1 << 7), /* keep activity stats */
+ ATA_LFLAG_WAIT_SRST = (1 << 8), /* add delay when SRST fails */

  /* struct ata_port flags */
  ATA_FLAG_SLAVE_POSS = (1 << 0), /* host supports slave dev */


On Fri, Sep 30, 2011 at 2:54 PM, Mike I <mihrcke@xxxxxxxxx> wrote:
>
> tj <at> kernel.org <tj <at> kernel.org> writes:
>
> >
> > Hello,
> >
> > How did that go?
> >
> > Thanks.
> >
>
> Like Derry who started this thread, I too had seen an old thread from
> October/November 2008 with what appeared to be no resolution to this problem.
> Now, finding this thread, again, with no apparent resolution to this problem.
>
> I'm currently running Ubuntu 10.04 (lucid), kernel 2.6.32-33-generic.  I've no
> experience with applying these git patches, and my searching to figure out how
> it works have not helped.
>
> I'm using an Addonics eSATA PCI-X controller with the SiI3124 chipset, and I
> have an Addonics PM in an external enclosure, with a 5 bay/tray DAS.  Some of
> my drives give me this problem: (this occurs for me with pretty much ALL
> Samsung hard drives)
> [12888.470308] ata9.01: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf
> [12888.470313] ata9.01: SError: { PHYRdyChg CommWake DevExch }
> [12888.470385] ata9.01: hard resetting link
> [12889.211597] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [12889.211686] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
> [12889.211692] ata9.15: hard resetting link
> [12891.430086] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [12891.430397] ata9.00: hard resetting link
> [12891.780786] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [12894.211103] ata9.01: hard resetting link
> [12894.560424] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12894.560466] ata9.02: hard resetting link
> [12894.914176] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12894.914222] ata9.03: hard resetting link
> [12895.264141] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12895.264169] ata9.04: hard resetting link
> [12895.612930] ata9.04: SATA link down (SStatus 0 SControl 320)
> [12895.613007] ata9.05: hard resetting link
> [12895.964143] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
> [12896.065908] ata9.00: configured for UDMA/100
> [12896.065970] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
> [12896.065977] ata9.15: hard resetting link
> [12898.283804] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [12898.284128] ata9.00: hard resetting link
> [12898.634174] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [12899.562524] ata9.01: hard resetting link
> [12899.914147] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12899.914180] ata9.02: hard resetting link
> [12900.261682] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12900.261724] ata9.03: hard resetting link
> [12900.610413] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12900.961283] ata9.05: hard resetting link
> [12901.310385] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
> [12901.397241] ata9.00: configured for UDMA/100
> [12901.397300] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
> [12901.397305] ata9.01: failed to recover link after 3 tries, disabling
> [12901.397311] ata9.15: hard resetting link
> [12903.613694] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [12903.960564] ata9.00: hard resetting link
> [12904.311125] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [12905.260154] ata9.02: hard resetting link
> [12905.602929] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12905.611319] ata9.03: hard resetting link
> [12905.962555] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12905.962592] ata9.04: hard resetting link
> [12906.312931] ata9.04: SATA link down (SStatus 0 SControl 320)
> [12906.313004] ata9.05: hard resetting link
> [12906.660409] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
> [12906.753619] ata9.00: configured for UDMA/100
> [12906.766586] ata9.02: configured for UDMA/100
> [12906.771917] ata9.03: configured for UDMA/100
> [12907.121462] ata9: EH complete
>
> If I hot plug the same drive using a port directly off my mobo(no PM in the
> mix), I get this result(drive connects/mounts/works):
> [27997.097104] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action
> 0xe frozen
> [27997.097108] ata5: irq_stat 0x00400040, connection status changed
> [27997.097111] ata5: SError: { PHYRdyChg CommWake DevExch }
> [27997.097116] ata5: hard resetting link
> [28007.147622] ata5: softreset failed (device not ready)
> [28007.147627] ata5: hard resetting link
> [28010.630028] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [28010.748595] ata5.00: ATA-7: SAMSUNG HD154UI, 1AG01118, max UDMA7
> [28010.748599] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> [28010.755227] ata5.00: configured for UDMA/133
> [28010.755237] ata5: EH complete
> [28010.756338] scsi 4:0:0:0: Direct-Access     ATA      SAMSUNG HD154UI  1AG0
> PQ: 0 ANSI: 5
> [28010.756475] sd 4:0:0:0: Attached scsi generic sg10 type 0
> [28010.756572] sd 4:0:0:0: [sdj] 2930277168 512-byte logical blocks: (1.50
> TB/1.36 TiB)
> [28010.756613] sd 4:0:0:0: [sdj] Write Protect is off
> [28010.756616] sd 4:0:0:0: [sdj] Mode Sense: 00 3a 00 00
> [28010.756636] sd 4:0:0:0: [sdj] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [28010.756760]  sdj: sdj1
> [28010.816161] sd 4:0:0:0: [sdj] Attached SCSI disk
>
> I've been using Ubuntu for a few years now, and have been living with the
> problem...working around it with USB docking stations and such.  But, I'd
> really hope to see/find this problem worked out.
>
> Thoughts/tips/suggestions?  Since I'm pretty much a novice when it comes to
> patching, a link to a guide for git patching would be appreciated too.
>
> Thank You,
> Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux