On Wed, Mar 14, 2012 at 2:33 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> On Wed, 2012-03-14 at 17:11 -0400, Alan Stern wrote:
>> I see. It's a nasty situation; I guess the best way to describe it is
>> a conflict between the requirements of the PM and SCSI subsystems:
>>
>> The PM core runs suspend and resume operations in async
>> threads, and these threads need to acquire the device lock;
>>
>> The sd driver needs to insure that async probing is finished
>> when starting its remove routine, and it is called with the
>> device lock held.
>>
>> The best solution might be to use a workqueue for sd's async probing
>> instead of the "async" threads. Then the work routine could be
>> cancelled without doing async_synchronize_full().
>
> Hmm... I was wondering why we needed a kernel global sync. If this is
> just for probe work what about just doing something like the following?
> Keep the sync operation local to probe-work and not entangle with the
> resume-work? I'll give this a shot when I get back to my test system.
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index bd17cf8..ec69e35 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2629,6 +2629,8 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
> put_device(&sdkp->dev);
> }
>
> +static LIST_HEAD(sd_probe_domain);
> +
> /**
> * sd_probe - called during driver initialization and whenever a
> * new scsi device is attached to the system. It is called once
> @@ -2717,7 +2719,7 @@ static int sd_probe(struct device *dev)
> dev_set_drvdata(dev, sdkp);
>
> get_device(&sdkp->dev); /* prevent release before async_schedule */
> - async_schedule(sd_probe_async, sdkp);
> + async_schedule_domain(sd_probe_async, sdkp, &sd_probe_domain);
>
> return 0;
>
> @@ -2751,7 +2753,7 @@ static int sd_remove(struct device *dev)
> sdkp = dev_get_drvdata(dev);
> scsi_autopm_get_device(sdkp->device);
>
> - async_synchronize_full();
> + async_synchronize_full_domain(&sd_probe_domain);
> blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
> blk_queue_unprep_rq(sdkp->device->request_queue, NULL);
> device_del(&sdkp->dev);
This works fine for me for resolving the deadlock, but I found I also
needed the following to fix a spurious:
scsi 6:0:1:0: Illegal state transition deleted->running
...in the resume path.
@@ -2348,10 +2349,11 @@ EXPORT_SYMBOL(scsi_device_quiesce);
*
* Must be called with user context, may sleep.
*/
-void
-scsi_device_resume(struct scsi_device *sdev)
+void scsi_device_resume(struct scsi_device *sdev)
{
- if(scsi_device_set_state(sdev, SDEV_RUNNING))
+ /* check if the device was deleted during suspend */
+ if (sdev->sdev_state == SDEV_DEL ||
+ scsi_device_set_state(sdev, SDEV_RUNNING))
return;
scsi_run_queue(sdev->request_queue);
}
Unless someone can point out something I'm missing I'll go ahead and
roll this into it's own patch and rebase/drop the hack out of the
libsas resume code.
--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[Linux Filesystems]
[Linux SCSI]
[Linux RAID]
[Git]
[Kernel Newbies]
[Linux Newbie]
[Share Photos]
[Security]
[Netfilter]
[Bugtraq]
[Photo]
[Yosemite]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Samba]
[Video 4 Linux]
[Device Mapper]
[Linux Resources]