On Mon, Apr 23, 2012 at 12:41 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> On Sun, Apr 22, 2012 at 10:30 AM, James Bottomley
> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> On Fri, 2012-04-13 at 16:37 -0700, Dan Williams wrote:
>>> When managing shost->host_eh_scheduled libata assumes that there is a
>>> 1:1 shost-to-ata_port relationship. libsas creates a 1:N relationship
>>> so it needs to manage host_eh_scheduled cumulatively at the host level.
>>> The sched_eh and end_eh port port ops allow libsas to track when domain
>>> devices enter/leave the "eh-pending" state under ha->lock (previously
>>> named ha->state_lock, but it is no longer just a lock for ha->state
>>> changes).
>>>
>>> Since host_eh_scheduled indicates eh without backing commands pinning
>>> the device it can be deallocated at any time. Move the taking of the
>>> domain_device reference under the port_lock to guarantee that the
>>> ata_port stays around for the duration of eh.
>>
>>> Cc: Tejun Heo <tj@xxxxxxxxxx>
>>> Acked-by: Jacek Danecki <jacek.danecki@xxxxxxxxx>
>>
>> Could we standardise on Acked-by, please. In my book it means the
>> maintainer of a piece of code agrees with the change and lets me take it
>> through my tree. I'm aware that not everyone uses this definition, so
>> we can use a different standard from my current one, but what does it
>> mean in this case?
>>
>>> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>>> ---
>>> drivers/ata/libata-core.c | 4 ++
>>> drivers/ata/libata-eh.c | 57 ++++++++++++++++++++++++++++-------
>>> drivers/scsi/libsas/sas_ata.c | 38 +++++++++++++++++++++--
>>> drivers/scsi/libsas/sas_discover.c | 6 ++--
>>> drivers/scsi/libsas/sas_event.c | 12 ++++---
>>> drivers/scsi/libsas/sas_init.c | 14 ++++-----
>>> drivers/scsi/libsas/sas_scsi_host.c | 27 +++++++++++++----
>>> include/linux/libata.h | 4 ++
>>> include/scsi/libsas.h | 4 ++
>>> include/scsi/sas_ata.h | 5 +++
>>> 10 files changed, 134 insertions(+), 37 deletions(-)
>>
>> This is a pretty big change for rc fixes. None of the other changes in
>> the series seem to be dependent on it, what bug is it actually fixing?
>
> It fixes two bugs (which in hindsight could potentially be split into
> two patches). The major one is guarantees about when
> host_eh_scheduled is cleared. Prior to this patch when at least one
> ata_port in a domain has completed eh the flag for host is cleared.
> This patch plus "scsi: fix eh wakeup (scsi_schedule_eh vs
> scsi_restart_operations)" fixes up deadlocks (waiting indefinitely for
> eh to complete) and cases where eh terminates too early
> (host_eh_scheduled cleared with work pending) in our hot plug tests.
>
> The other bug this uncovered is the fact that libsas makes use of the
> port and rphy object after libata has completed it's work, so this
> patch also moved the taking of the domain_device reference to be under
> spin_lock_irq(&sas_ha->phy_port_lock) and
> spin_lock(&port->dev_list_lock). Otherwise, if no commands are
> pending for the device libsas can proceed directly to freeing the
> domain_device before the requested eh runs.
>
Ping, will this one be queued to scsi/fixes? All our hotplug testing
was done with this patch in place, and it's straightforward to see
that libata is mismanaging host_eh_scheduled in the libsas ata_port
case.
--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[Linux Filesystems]
[Linux SCSI]
[Linux RAID]
[Git]
[Kernel Newbies]
[Linux Newbie]
[Share Photos]
[Security]
[Netfilter]
[Bugtraq]
[Photo]
[Yosemite]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Samba]
[Video 4 Linux]
[Device Mapper]
[Linux Resources]