Re: btrfs send/receive review by vfs folks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, October 08, 2012 at 13:38 (+0200), Alex Lyakas wrote:
>>> I realize this is a big change, and a new IOCTL has to be introduced
>>> in order not to break current user-kernel protocol.
>>> The pros as I see them:
>>> # One data-copy is avoided (no pipe). For WRITE commands two
>>> data-copies are avoided (no read_buf needed)
>>
>> I'm not sure I understand those correctly. If you're talking about the user mode
>> part, we could simply pass stdout to the kernel, saving the unnecessary pipe and
>> copy operations in between without introducing a new buffer.
> What I meant is the following:
> # For non-WRITE commands the flow is: put the command onto send_buf,
> copy to pipe, then user-space copies it out from the pipe. With my
> code: put command onto send_buf, then copy to user-space buffer
> (copy_to_user). So one data-copy is avoided (2 vs 3).
> # For WRITE commands: read data onto read_buf, then copy to send_buf,
> then copy to pipe, then user-mode copies to its buffer. With my code:
> read onto send_buf, then copy to user-space buffer. So 2 data-copies
> are avoided (2 vs 4).
> Does it make sense?

I'd rather just focus on the ERESTARTSYS issue for now.

>> On Wed, August 01, 2012 at 14:09 (+0200), Alexander Block wrote:
>>> I have two possible solutions in my mind.
>>> 1. Store some kind of state in the ioctl arguments so that we can
>>> continue where we stopped when the ioctl reenters. This would however
>>> complicate the code a lot.
>>> 2. Spawn a thread when the ioctl is called and leave the ioctl
>>> immediately. I don't know if ERESTARTSYS can happen in vfs_xxx calls
>>> if they happen from a non syscall thread.
>>
>> What do you think about those two?
> I am not familiar enough with Linux kernel - will the second one work?

I not an expert here, either. My uneducated guess is that even a spawned kernel
thread can have signals pending, in which case we would be in the same trouble.

>> I like the first suggestion. Combining single-ioctl with signal handling
>> capabilities feels like the right choice. When we get ERESTARTSYS, we know
>> exactly how many bytes made it to user mode. To reach a comfortable state for a
>> restart, we can store part of the stream together with the meta information in
>> our internal state before returning to user mode.
> I thought that ERESTARTSYS never returns to user mode (this is what I
> saw in my tests also).

That error isn't returned to a user process, that's correct. From a kernel
developer's perspective, we're returning to user space, though, waiting for the
pending signal to be processed.

>> The ioctl will be restarted sooner or later and our internal state tells us where to proceed.
> 
> Ok, so you say that we should maintain checkpoint-like information and
> use it if the ioctl is automatically restarted. This is quite close to
> what I have done, I believe; we still need all the capabilities of
> re-arming tree search, saving context, skipping commands etc, that I
> have written. Did I understand your proposal correctly?

In some way, yes. If possible, I'd like to (with decreasing priority)
- stick with the stream format
- maintain ioctl-compatibility
- have a much smaller patch
- store less state
- stick with the pipe

I haven't had time to completely understand your checkpoint concept, and
one-big-chunk patches are hard to read. But the size of the added data
structures scares me. Shouldn't we be able to do this with as little information
as a single btrfs key and a buffer of generated but not yet pushed stream data?
This would also save us skipping commands or output bytes.

> But tell me one thing: let's say we call vfs_write() and it returns
> -ERESTARTSYS. Can we be sure that it wrote 0 bytes to the pipe in this
> call?

>From the version of fs/pipe.c in my working directory: yes. If we rely on this,
we'd better check if this is something to rely on, but I hope it is.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux