在 2011年11月27日 上午10:38,James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> 写道:
> On Sun, 2011-11-27 at 10:20 +0800, Huajun Li wrote:
>> 在 2011年11月27日 上午9:28,Huajun Li <huajun.li.lee@xxxxxxxxx> 写道:
>> > 在 2011年11月26日 下午10:00,James Bottomley <jbottomley@xxxxxxxxxxxxx> 写道:
>> >> On Sat, 2011-11-26 at 10:51 +0100, Stefan Richter wrote:
>> >>> On Nov 24 Huajun Li wrote:
>> >>> > While unplugging usb disk, scsi_disk(disk)->device may be released
>> >>> > before sd_revalidate_disk() is called, then there will occur Oops as
>> >>> > shown below:
>> >>> [...]
>> >>> > --- a/drivers/scsi/sd.c
>> >>> > +++ b/drivers/scsi/sd.c
>> >>> > @@ -2354,10 +2354,15 @@ static int sd_try_extended_inquiry(struct
>> >>> > scsi_device *sdp)
>> >>> > static int sd_revalidate_disk(struct gendisk *disk)
>> >>> > {
>> >>> > struct scsi_disk *sdkp = scsi_disk(disk);
>> >>> > - struct scsi_device *sdp = sdkp->device;
>> >>> > + struct scsi_device *sdp;
>> >>> > unsigned char *buffer;
>> >>> > unsigned flush = 0;
>> >>> >
>> >>> > + if (sdkp)
>> >>> > + sdp = sdkp->device;
>> >>> > + else
>> >>> > + goto out;
>> >>> > +
>> >>> > SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
>> >>> > "sd_revalidate_disk\n"));
>> >>> >
>> >>>
>> >>> Shouldn't rather the [block -- command-set-driver -- scsi-core -- lld]
>> >>> stack be structured along the lines that lower-level device instances live
>> >>> as long as higher levels rely on them?
>> >>
>> >> Not really, no. The problem is the lifetime rules are complex. The
>> >> lowest level objects actually live the longest because they're first to
>> >> be discovered before we know what higher level functions to attach.
>> >>
>> >> That's why you see complex paired gets in things like scsi_disk_get
>> >> because we try to get references both to the sdkp and the underlying
>> >> sdp ... so the sdkp can be torn down by last reference release from
>> >> either above or below (this is the menace of hot unplug).
>> >>
>> >>> For about a year now or so, I am seeing patches floating by that add NULL
>> >>> pointer checks here and there (or patches that clear pointers), and every
>> >>> time I wonder
>> >>> - where else such NULL pointer checks might be needed,
>> >>> - in what way (if at all) it is ensured that a function which is made to
>> >>> check for a valid pointer at the top does not race with pointer
>> >>> invalidation further down the road.
>> >>
>> >> I agree. The patch is clearly wrong because sdkp is a refcounted object
>> >> that never actually sets sdkp->device to NULL. If we find a NULL in
>> >> there it must be because the sdkp object is wrong. The implication from
>> >> the trace seems to be that something allowed blkid to open a non
>> >> existent device.
>> >>
>> >> James
>> >>
>> >>
>> >
>> > Thanks for your response.
>> >
>> > Yes, in this case, actually sdkp is NULL rather than sdkp->device, and
>> > the patch indicates this.
>> > However, there is typo in my comments, maybe it misleads you, sorry!
>> > In fact, the comment should be:
>
> No, it's not that ... it's how did the open get far enough to do the
> revalidation?
>
USB disconnection can happen at any time, even after an open
successfully get called, then revalidation will be triggered ?
To reproduce the crash, running a script like below, at the same time,
plug/unplug USB repeatedly, then easily reproduce it.
------------------------------------------------
while [ 1 ]
do
fdisk -l /dev/sdb # <------ The USB will be shown as
/dev/sdb on my host
sleep 0.05
done
>> > " While unplugging usb disk, scsi_disk(disk) may be released
>> > before sd_revalidate_disk() is called, then there will occur Oops."
>> >
>>
>> BTW, after applied the patch and repeatedly plug/unplug the USB stick,
>> I did not see this crash.
>>
>> However, the other concern is, need we validate scsi_disk(disk) in
>> some other functions specified in sd_fops ? since they may be also
>> called from other layer after scsi_disk(disk) is released.
>
> No, it's nothing to do with that. To call any of the block operations,
> you need to call ->open successfully first. since ->open either gets a
> ref to the sdkp or returns an error, how did we happen to be in
> sd_revalidate with a NULL sdkp?
>
> The call trace says we came through rescan_partitions() which only
> happens if ->open returns zero (or -ERESTARTSYS).
>
> James
>
If a USB disk is removed, then scsi_disk_release() will be called,
and it will release sdkp, right?
and after this point, if sd_revalidate is triggered from other layer
then it will find sdkp is freed, does this scenario make sense ?
2695 static void scsi_disk_release(struct device *dev)
2696 {
2697 struct scsi_disk *sdkp = to_scsi_disk(dev);
2698 struct gendisk *disk = sdkp->disk;
2699
2700 spin_lock(&sd_index_lock);
2701 ida_remove(&sd_index_ida, sdkp->index);
2702 spin_unlock(&sd_index_lock);
2703
2704 disk->private_data = NULL;
2705 put_disk(disk);
2706 put_device(&sdkp->device->sdev_gendev);
2707
2708 kfree(sdkp); <==============
2709 }
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[SCSI Target Devel]
[Linux SCSI Target Infrastructure]
[Kernel Newbies]
[Share Photos]
[IDE]
[Security]
[Git]
[Netfilter]
[Bugtraq]
[Photos]
[Yosemite]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Linux ATA RAID]
[Linux IIO]
[Samba]
[Video 4 Linux]
[Device Mapper]
[Linux Resources]