11.04.2012 21:33, Stanislav Kinsbursky пишет:
11.04.2012 21:20, J. Bruce Fields пишет:On Wed, Apr 11, 2012 at 02:34:37PM +0400, Stanislav Kinsbursky wrote:11.04.2012 00:22, J. Bruce Fields пишет:On Tue, Apr 10, 2012 at 04:46:38PM +0400, Stanislav Kinsbursky wrote:10.04.2012 16:16, Jeff Layton пишет:On Tue, 10 Apr 2012 15:44:42 +0400 (sorry about the earlier truncated reply, my MUA has a mind of its own this morning)OK then. Previous letter confused me a bit.TBH, I haven't considered that in depth. That is a valid situation, but one that's discouraged. It's very difficult (and expensive) to sequester off portions of a filesystem for serving. A filehandle is somewhat analogous to a device/inode combination. When the server gets a filehandle, it has to determine "is this within a path that's exported to this host"? That process is called subtree checking. It's expensive and difficult to handle. It's always better to export along filesystem boundaries. My suggestion would be to simply not deal with those cases in this patch. Possibly we could force no_subtree_check when we export an fs with a locks_in_grace option defined.Sorry, but without dealing with those cases your patch looks a bit... Useless. I.e. it changes nothing, it there will be no support from file systems, going to be exported. But how are you going to push developers to implement these calls? Or, even if you'll try to implement them by yourself, how they will looks like? Simple check only for superblock looks bad to me, because any other start of NFSd will lead to grace period for all other containers (which uses the same filesystem).That's the correct behavior, and it sounds simple to implement. Let's just do that. If somebody doesn't like the grace period from another container intruding on their use of the same filesystem, they should either arrange to export different filesystems (not just different subtrees)>from their containers, or arrange to start all their containers at thesame time so their grace periods overlap.Starting all at once is not a very good solution. When you start 100 containers simultaneously - then you can't predict, when the process as a whole will succeed (it will produce heavy load on all subsystems). Moreover, there is also server restart...So you really are exporting subtrees of the same filesystem from multiple containers? Why?Everything is very-very simple and obvious. We use "chroot jail". This is the most often and simple setup for containers. And, basicaly, Virtuozzo container file system consist of two parts: one of them is it's private modified data, another part is a template, used for all containers based on it (rhel6, for example; when it's content is modified my some container - then modified file copied to private part of container, which modified the file). Anyway, with properly configured environment it could be as many containers on the same file system, as possible. And making sure, that no data shared between them is root's responsibility. This approach gives us journal bottleneck. That's why, in future we are going to use "ploop" device (a kind of a very smart loop device) per container. And thus this problem with grace period for file systems will disappear.
One notice: of course, root can configure a partition per container. But it looks too much (especially when container is very tiny). And people don't keep in mind such non-obvious things like NFSd grace period while configuring the environment.
-- Best regards, Stanislav Kinsbursky -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html