Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/9/12 下午10:03, David Newall wrote:
> Hello Qu,
>
> Thank you very much for helping me with this.
>
> On 12/9/19 4:35 pm, Qu Wenruo wrote:
>> Would you please check how fast (or how slow in this particular case)
>> the related disks are?
>> To me, it really looks like just too slow devices.
>
> I discover that you are correct about the underlying storage being
> slow.  Nikolay suggested that, too.
>
> Although I mentioned that the filesystem is encrypted with luks on the
> VM, I didn't say that the underlying storage is connected via multipath
> iSCSI (with two paths) on the host server, and provided to the VM via
> KVM as Virtio disk, which should be fine, but, using dd (bs=1024k
> count=15) on the VM, I'm seeing a woeful 255KB/s read speed through the
> encryption layer, and 274KB/s from the raw disk.  :-(
>
> On the host, I'm seeing 2MB/s via one path and 846KB/s via the other, so
> I think that's where I need to turn my attention.  (Time to benchmark,
> turn off one path, and speak to the DC management.)

Glad we found the root cause.

>
>> I see all dumps are waiting for write_all_supers.
>>
>> Would you please provide the code context of
>> write_all_supers.isra.43+0x977?
>>
>> I guess it's wait_dev_flush(), which is just really waiting for disk
>> writes.
>
> Sorry, I don't understand what you mean by "code context".  Maybe the
> question is now moot.
>
> Although it's now apparent that I've got a really slow disk, I still
> wonder if btrfs is holding a lock for an unnecessarily long time
> (assuming that it is btrfs holding the lock.)  I feel that having to
> wait tens of minutes to find the device names of mounted devices could
> never be intended, so there might be something that needs tweaking.

It's not completely unnecessary, but you're right, we can enhance it.

It's the device mutex. At the context of committing a transaction, we
definitely don't want a random new device joining in while we're
iterating devices to flush each device.

However you're still right, since the flush can be slow, we shouldn't
block other dev list read operations, thus it may be a good idea to make
fs_devices->device_list_mutex a rw_semaphore.
So that we only block device add/remove while still allow other device
list read-only operations to kick in.

We may need to take a look into, but please also take in mind that, the
benefit may only be obvious for such slow device, so it's up to the
developers.

Thanks,
Qu

>
> On 12/9/19 3:58 pm, Nikolay Borisov wrote:
>> Actually when the issue occurs again can you sample the output of
>> echo w > /proc/sysrq-trigger.  Because right now you have provided
>> 3 samples in the course of I don't know how many minutes. So they just
>> give
>> a momentarily glimpse into what's happening. E.g. just because we saw
>> btrfs transaction/btrfs_show_devname doesn't necessarily mean that's
>> what's happening (Though having the same consistent state in the 3 logs
>> kind of suggests otherwise).
>
> Again, it's probably all moot, now, but I did take samples at about
> 20-second intervals during 20-minutes of the "hang" period while rsync
> was running.  See https://davidnewall.com/kern.5 through kern.62.
>
> Thanks to all for your help.
>
> Regards,
>
> David
>




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux