On Mon, October 08, 2012 at 13:38 (+0200), Alex Lyakas wrote: >>> I realize this is a big change, and a new IOCTL has to be introduced >>> in order not to break current user-kernel protocol. >>> The pros as I see them: >>> # One data-copy is avoided (no pipe). For WRITE commands two >>> data-copies are avoided (no read_buf needed) >> >> I'm not sure I understand those correctly. If you're talking about the user mode >> part, we could simply pass stdout to the kernel, saving the unnecessary pipe and >> copy operations in between without introducing a new buffer. > What I meant is the following: > # For non-WRITE commands the flow is: put the command onto send_buf, > copy to pipe, then user-space copies it out from the pipe. With my > code: put command onto send_buf, then copy to user-space buffer > (copy_to_user). So one data-copy is avoided (2 vs 3). > # For WRITE commands: read data onto read_buf, then copy to send_buf, > then copy to pipe, then user-mode copies to its buffer. With my code: > read onto send_buf, then copy to user-space buffer. So 2 data-copies > are avoided (2 vs 4). > Does it make sense? I'd rather just focus on the ERESTARTSYS issue for now. >> On Wed, August 01, 2012 at 14:09 (+0200), Alexander Block wrote: >>> I have two possible solutions in my mind. >>> 1. Store some kind of state in the ioctl arguments so that we can >>> continue where we stopped when the ioctl reenters. This would however >>> complicate the code a lot. >>> 2. Spawn a thread when the ioctl is called and leave the ioctl >>> immediately. I don't know if ERESTARTSYS can happen in vfs_xxx calls >>> if they happen from a non syscall thread. >> >> What do you think about those two? > I am not familiar enough with Linux kernel - will the second one work? I not an expert here, either. My uneducated guess is that even a spawned kernel thread can have signals pending, in which case we would be in the same trouble. >> I like the first suggestion. Combining single-ioctl with signal handling >> capabilities feels like the right choice. When we get ERESTARTSYS, we know >> exactly how many bytes made it to user mode. To reach a comfortable state for a >> restart, we can store part of the stream together with the meta information in >> our internal state before returning to user mode. > I thought that ERESTARTSYS never returns to user mode (this is what I > saw in my tests also). That error isn't returned to a user process, that's correct. From a kernel developer's perspective, we're returning to user space, though, waiting for the pending signal to be processed. >> The ioctl will be restarted sooner or later and our internal state tells us where to proceed. > > Ok, so you say that we should maintain checkpoint-like information and > use it if the ioctl is automatically restarted. This is quite close to > what I have done, I believe; we still need all the capabilities of > re-arming tree search, saving context, skipping commands etc, that I > have written. Did I understand your proposal correctly? In some way, yes. If possible, I'd like to (with decreasing priority) - stick with the stream format - maintain ioctl-compatibility - have a much smaller patch - store less state - stick with the pipe I haven't had time to completely understand your checkpoint concept, and one-big-chunk patches are hard to read. But the size of the added data structures scares me. Shouldn't we be able to do this with as little information as a single btrfs key and a buffer of generated but not yet pushed stream data? This would also save us skipping commands or output bytes. > But tell me one thing: let's say we call vfs_write() and it returns > -ERESTARTSYS. Can we be sure that it wrote 0 bytes to the pipe in this > call? >From the version of fs/pipe.c in my working directory: yes. If we rely on this, we'd better check if this is something to rely on, but I hope it is. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
