On Sun, May 27, 2012 at 11:02 PM, Ian Kent <raven@xxxxxxxxxx> wrote:
> On Wed, 2012-02-29 at 14:10 -0400, Leonardo Chiquitto wrote:
>> On Wed, Dec 14, 2011 at 2:05 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
>> > On Mon, 2011-12-12 at 19:08 -0300, Leonardo Chiquitto wrote:
>> >> On Mon, Dec 12, 2011 at 3:39 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
>> >> > On Mon, 2011-12-05 at 12:52 -0300, Leonardo Chiquitto wrote:
>> >> >> Hello,
>> >> >>
>> >> >> There is a problem in AutoFS that makes the daemon hang forever when
>> >> >> starting up. It can be reproduced reliably with the latest git checkout,
>> >> >> if you use the following configure options: --disable-mount-locking
>> >> >> --enable-forced-shutdown --enable-ignore-busy.
>> >> >>
>> >> >> You'll also need to setup a nested direct map:
>> >> >>
>> >> >> # cat /etc/auto.master
>> >> >> /- /etc/auto.direct
>> >> >>
>> >> >> # cat /etc/auto.direct
>> >> >> /test \
>> >> >> / -nfsvers=3 10.1.1.65:/nfs \
>> >> >> /var -nfsvers=3 10.1.1.65:/nfs/var
>> >> >>
>> >> >> And then run these steps:
>> >> >>
>> >> >> 1. Start AutoFS
>> >> >> 2. cd /test
>> >> >> 3. Stop AutoFS
>> >> >> 4. Start AutoFS
>> >> >>
>> >> >> The automounter will deadlock when starting up because it calls
>> >> >> cache_writelock() in a lock that is already "readlocked" by the
>> >> >> same thread. Here's the call traces (Thread 3 is the one that
>> >> >> deadlocks):
>> >> >>
>> >> >> Thread 4 (Thread 0x7f63a0b18700 (LWP 6923)):
>> >> >> #0 0x00007f63a0702e6c in pthread_cond_wait@@GLIBC_2.3.2 ()
>> >> >> from /lib64/libpthread.so.0
>> >> >> #1 0x00007f63a0b665c8 in master_do_mount (entry=0x7f63a0d9dd30)
>> >> >> at master.c:1065
>> >> >> #2 0x00007f63a0b66a38 in master_mount_mounts (master=0x7f63a0d82310,
>> >> >> age=1323091131, readall=0) at master.c:1206
>> >> >> #3 0x00007f63a0b65c03 in master_read_master (master=0x7f63a0d82310,
>> >> >> age=1323091131, readall=0) at master.c:860
>> >> >> #4 0x00007f63a0b45d45 in main (argc=0, argv=0x7fff20d087a0)
>> >> >> at automount.c:2202
>> >> >>
>> >> >> Thread 3 (Thread 0x7f639f564700 (LWP 6928)):
>> >> >> #0 0x00007f63a0702610 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
>> >> >> #1 0x00007f63a0b56d48 in cache_writelock (mc=0x7f63a0d9e010) at cache.c:72
>> >> >> #2 0x00007f639e706c23 in parse_mount (ap=0x7f63a0d9de50,
>> >> >> name=0x7f639f563400 "/nfs", name_len=4,
>> >> >> mapent=0x7f639f562400 "/ -nfsvers=3 10.1.1.65:/nfs \t/var -nfsvers=3 10.1.1.65:/nfs/var", context=0x7f63a0d83c90) at parse_sun.c:1550
>> >> >> #3 0x00007f639eb498ea in lookup_mount (ap=0x7f63a0d9de50,
>> >> >> name=0x7f63a0da4460 "/nfs", name_len=4, context=0x7f63a0d899a0)
>> >> >> at lookup_file.c:1074
>> >> >> #4 0x00007f63a0b523d5 in do_lookup_mount (ap=0x7f63a0d9de50,
>> >> >> map=0x7f63a0da40d0, name=0x7f63a0da4460 "/nfs", name_len=4) at lookup.c:678
>> >> >> #5 0x00007f63a0b5258c in lookup_name_file_source_instance (ap=0x7f63a0d9de50,
>> >> >> map=0x7f63a0d9df60, name=0x7f63a0da4460 "/nfs", name_len=4) at lookup.c:718
>> >> >> #6 0x00007f63a0b52cc3 in lookup_nss_mount (ap=0x7f63a0d9de50, source=0x0,
>> >> >> name=0x7f63a0da4460 "/nfs", name_len=4) at lookup.c:909
>> >> >> #7 0x00007f63a0b5c678 in do_remount_direct (ap=0x7f63a0d9de50, fd=11,
>> >> >> path=0x7f63a0da4460 "/nfs") at mounts.c:1159
>> >> >> #8 0x00007f63a0b5d003 in remount_active_mount (ap=0x7f63a0d9de50,
>> >> >> mc=0x7f63a0d9e010, path=0x7f63a0da4460 "/nfs", devid=31, type=2,
>> >> >> ioctlfd=0x7f639f563a30) at mounts.c:1329
>> >> >> #9 0x00007f63a0b5d0f3 in try_remount (ap=0x7f63a0d9de50, me=0x7f63a0da4390,
>> >> >> type=2) at mounts.c:1357
>> >> >> #10 0x00007f63a0b49dee in do_mount_autofs_direct (ap=0x7f63a0d9de50,
>> >> >> mnts=0x7f63a0da44d0, me=0x7f63a0da4390) at direct.c:339
>> >> >> #11 0x00007f63a0b4a811 in mount_autofs_direct (ap=0x7f63a0d9de50)
>> >> >> at direct.c:511
>> >> >> #12 0x00007f63a0b42f2a in mount_autofs (ap=0x7f63a0d9de50,
>> >> >> root=0x7f63a0d83500 "/-") at automount.c:1003
>> >> >> #13 0x00007f63a0b44a80 in handle_mounts (arg=0x7fff20d08440)
>> >> >> at automount.c:1561
>> >> >> #14 0x00007f63a06fef05 in start_thread () from /lib64/libpthread.so.0
>> >> >> #15 0x00007f639f84853d in clone () from /lib64/libc.so.6
>> >> >>
>> >> >> Thread 2 (Thread 0x7f63a0b16700 (LWP 6925)):
>> >> >> #0 0x00007f63a0702e6c in pthread_cond_wait@@GLIBC_2.3.2 ()
>> >> >> from /lib64/libpthread.so.0
>> >> >> #1 0x00007f63a0b5625a in st_queue_handler (arg=0x0) at state.c:1073
>> >> >> #2 0x00007f63a06fef05 in start_thread () from /lib64/libpthread.so.0
>> >> >> #3 0x00007f639f84853d in clone () from /lib64/libc.so.6
>> >> >>
>> >> >> Thread 1 (Thread 0x7f63a0b32700 (LWP 6924)):
>> >> >> #0 0x00007f63a0702e6c in pthread_cond_wait@@GLIBC_2.3.2 ()
>> >> >> from /lib64/libpthread.so.0
>> >> >> #1 0x00007f63a0b629f8 in alarm_handler (arg=0x0) at alarm.c:186
>> >> >> #2 0x00007f63a06fef05 in start_thread () from /lib64/libpthread.so.0
>> >> >> #3 0x00007f639f84853d in clone () from /lib64/libc.so.6
>> >> >>
>> >> >> And the debug logs:
>> >> >>
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: Starting automounter version 5.0.6, master map auto.master
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: using kernel protocol version 5.02
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_nss_read_master: reading master files auto.master
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: parse_init: parse(sun): init gathered global options: (null)
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_read_master: lookup(file): read entry /-
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: master_do_mount: mounting /-
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: automount_path_to_fifo: fifo name /run/autofs.fifo--
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_nss_read_map: reading map file /etc/auto.direct
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: parse_init: parse(sun): init gathered global options: (null)
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: remount_active_mount: trying to re-connect to mount /nfs
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: mounted direct on /nfs with timeout 300, freq 75 seconds
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_mount: lookup(file): looking up /nfs
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_mount: lookup(file): /nfs -> / -nfsvers=3 10.1.1.65:/nfs #011/var -nfsvers=3 10.1.1.65:/nfs/var
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: parse_mount: parse(sun): expanded entry: / -nfsvers=3 10.1.1.65:/nfs #011/var -nfsvers=3 10.1.1.65:/nfs/var
>> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: parse_mount: parse(sun): gathered options:
>> >> >>
>> >> >> Here's an initial analysis of the Thread 3 behavior:
>> >> >>
>> >> >> #15 clone ()
>> >> >> #14 start_thread ()
>> >> >> #13 handle_mounts at automount.c:1561
>> >> >> #12 mount_autofs at automount.c:1003
>> >> >> #11 mount_autofs_direct at direct.c:511
>> >> >>
>> >> >> In mount_autofs_direct() (daemon/direct.c:486), cache_readlock()
>> >> >> acquires the lock that will be unlocked by pthread_cleanup_pop(1)
>> >> >> (daemon/direct.c:515). However, before the unlock, the remount
>> >> >> code path goes on like this:
>> >> >>
>> >> >> #10 do_mount_autofs_direct at direct.c:339
>> >> >> #9 try_remount at mounts.c:1357
>> >> >> #8 remount_active_mount at mounts.c:1329
>> >> >> #7 do_remount_direct at mounts.c:1159
>> >> >> #6 lookup_nss_mount at lookup.c:909
>> >> >> #5 lookup_name_file_source_instance at lookup.c:718
>> >> >> #4 do_lookup_mount at lookup.c:678
>> >> >> #3 lookup_mount at lookup_file.c:1074
>> >> >> #2 parse_mount at parse_sun.c:1550
>> >> >>
>> >> >> ... calling cache_writelock() in the same mapent_cache that was
>> >> >> locked above.
>> >> >>
>> >> >> #1 cache_writelock at cache.c:72
>> >> >>
>> >> >> Any ideas on how this could be fixed?
>> >> >
>> >> > Give this patch a try please.
>> >> > I still need to test using the submount test since there may be a
>> >> > problem with them, based on the comment in the code that's been
>> >> > removed. I'll report back on how that goes.
>> >> >
>> >> > autofs-5.0.6 - fix remount deadlock
>> >> >
>> >> > From: Ian Kent <raven@xxxxxxxxxx>
>> >> >
>> >> > When reconstructing the mount tree upon restart a writelock to the map
>> >> > entry cache cannot be taken when parsing an entry because a readlock
>> >> > already is held higher up in the call tree.
>> >> >
>> >> > In the place this is done it shouldn't be necessary to alter the map
>> >> > entries in the cache and it shouldn't be necessary to delete an existing
>> >> > multi-mount cache entries to avoid a duplicate multi-mount entry error
>> >> > return. The check for a duplicate should be able to be done in the cache
>> >> > handling functions.
>> >> >
>> >> > I don't remember how this code came to be in the parsing function but
>> >> > it is likely it was due to a problem observed when testing using the
>> >> > autofs connectathon tests. With the change here the tests still pass
>> >> > and attempts to add a duplicate multi-mount entry are still detected.
>> >> >
>> >> > But the deadlock reported by Leonardo Chiquitto also appears to be
>> >> > resolved.
>> >> > ---
>> >> >
>> >> > lib/cache.c | 8 +++++---
>> >> > modules/parse_sun.c | 38 ++++++--------------------------------
>> >> > 2 files changed, 11 insertions(+), 35 deletions(-)
>> >> >
>> >> >
>> >> > diff --git a/lib/cache.c b/lib/cache.c
>> >> > index 9acf736..3464e7d 100644
>> >> > --- a/lib/cache.c
>> >> > +++ b/lib/cache.c
>> >> > @@ -658,10 +658,12 @@ int cache_add_offset(struct mapent_cache *mc, const char *mkey, const char *key,
>> >> > return CHE_FAIL;
>> >> >
>> >> > me = cache_lookup_distinct(mc, key);
>> >> > - if (me && me != owner)
>> >> > - return CHE_DUPLICATE;
>> >> > + if (me && me->age == age) {
>> >> > + if (me != owner)
>> >> > + return CHE_DUPLICATE;
>> >> > + }
>> >> >
>> >> > - ret = cache_add(mc, owner->source, key, mapent, age);
>> >> > + ret = cache_update(mc, owner->source, key, mapent, age);
>> >> > if (ret == CHE_FAIL) {
>> >> > warn(logopt, "failed to add key %s to cache", key);
>> >> > return CHE_FAIL;
>> >> > diff --git a/modules/parse_sun.c b/modules/parse_sun.c
>> >> > index 13b8af8..0da1802 100644
>> >> > --- a/modules/parse_sun.c
>> >> > +++ b/modules/parse_sun.c
>> >> > @@ -843,12 +843,14 @@ add_offset_entry(struct autofs_point *ap, const char *name,
>> >> > strcpy(m_mapent, loc);
>> >> >
>> >> > ret = cache_add_offset(mc, name, m_key, m_mapent, age);
>> >> > - if (ret == CHE_OK)
>> >> > - debug(ap->logopt, MODPREFIX
>> >> > - "added multi-mount offset %s -> %s", path, m_mapent);
>> >> > - else
>> >> > + if (ret != CHE_UPDATED)
>> >> > warn(ap->logopt, MODPREFIX
>> >> > "syntax error or duplicate offset %s -> %s", path, loc);
>> >> > + else {
>> >> > + debug(ap->logopt, MODPREFIX
>> >> > + "added multi-mount offset %s -> %s", path, m_mapent);
>> >> > + ret = CHE_OK;
>> >> > + }
>> >> >
>> >> > return ret;
>> >> > }
>> >> > @@ -1547,34 +1549,6 @@ int parse_mount(struct autofs_point *ap, const char *name,
>> >> > strcat(m_root, name);
>> >> > }
>> >> >
>> >> > - cache_writelock(mc);
>> >> > - me = cache_lookup_distinct(mc, name);
>> >> > - if (!me) {
>> >> > - int ret;
>> >> > - /*
>> >> > - * Not in the cache, perhaps it's a program map
>> >> > - * or one that doesn't support enumeration
>> >> > - */
>> >> > - ret = cache_add(mc, source, name, mapent, time(NULL));
>> >> > - if (ret == CHE_FAIL) {
>> >> > - cache_unlock(mc);
>> >> > - free(options);
>> >> > - return 1;
>> >> > - }
>> >> > - } else {
>> >> > - /*
>> >> > - * If the entry exists it must not have any existing
>> >> > - * multi-mount subordinate entries since we are
>> >> > - * mounting this afresh. We need to do this to allow
>> >> > - * us to fail on the check for duplicate offsets in
>> >> > - * we don't know when submounts go away.
>> >> > - */
>> >> > - cache_multi_writelock(me);
>> >> > - cache_delete_offset_list(mc, name);
>> >> > - cache_multi_unlock(me);
>> >> > - }
>> >> > - cache_unlock(mc);
>> >> > -
>> >> > cache_readlock(mc);
>> >> > me = cache_lookup_distinct(mc, name);
>> >> > if (me) {
>> >>
>> >> Hello Ian,
>> >>
>> >> Thanks for the patch! I confirm the deadlock is gone. I'm investigating a
>> >> new problem in the same test case (restarting AutoFS when a mount
>> >> with nested mounts is busy), but it's probably unrelated to the patch.
>> >> I'll gather more information and report that in a new thread to avoid
>> >> confusion.
>> >
>> > But testing using the submount test indicates there may be a problem
>> > with submounts using this patch or a new problem introduced with recent
>> > changes. I'm still working on it.
>> >
>> > The handling of nested mounts is difficult, especially upon restart in
>> > the presence of in use mounts. I look forward to your feedback on the
>> > problem.
>>
>> Hello Ian,
>>
>> Did you have time to revisit this problem? The "submounts problem" you
>> mentioned above was really introduced by the patch (fix remount deadlock)
>> you posted here?
>
> The problem I saw might not have been introduced by the patch above but
> that code most likely there for a reason that's still relevant.
>
> This patch I think is the way to do it, can you try it please.
>
> autofs-5.0.6 - fix remount deadlock
>
> From: Ian Kent <raven@xxxxxxxxxx>
>
> When reconstructing the mount tree upon restart a writelock to the map
> entry cache cannot be taken when parsing a direct map entry because a
> readlock is already held higher up in the call tree.
>
> In the place this is done it isn't be necessary to alter the direct map
> entries in the cache. Also, it shouldn't be necessary to delete existing
> multi-mount cache entries to avoid a duplicate multi-mount entry error
> return. The check for a duplicate can be done in the cache handling
> functions.
> ---
>
> lib/cache.c | 8 ++++---
> modules/parse_sun.c | 60 ++++++++++++++++++++++++++-------------------------
> 2 files changed, 35 insertions(+), 33 deletions(-)
>
>
> diff --git a/lib/cache.c b/lib/cache.c
> index 9acf736..3464e7d 100644
> --- a/lib/cache.c
> +++ b/lib/cache.c
> @@ -658,10 +658,12 @@ int cache_add_offset(struct mapent_cache *mc, const char *mkey, const char *key,
> return CHE_FAIL;
>
> me = cache_lookup_distinct(mc, key);
> - if (me && me != owner)
> - return CHE_DUPLICATE;
> + if (me && me->age == age) {
> + if (me != owner)
> + return CHE_DUPLICATE;
> + }
>
> - ret = cache_add(mc, owner->source, key, mapent, age);
> + ret = cache_update(mc, owner->source, key, mapent, age);
> if (ret == CHE_FAIL) {
> warn(logopt, "failed to add key %s to cache", key);
> return CHE_FAIL;
> diff --git a/modules/parse_sun.c b/modules/parse_sun.c
> index d7e74e2..8b0f8b6 100644
> --- a/modules/parse_sun.c
> +++ b/modules/parse_sun.c
> @@ -843,12 +843,17 @@ add_offset_entry(struct autofs_point *ap, const char *name,
> strcpy(m_mapent, loc);
>
> ret = cache_add_offset(mc, name, m_key, m_mapent, age);
> - if (ret == CHE_OK)
> + if (ret == CHE_DUPLICATE)
> + warn(ap->logopt, MODPREFIX
> + "syntax error or duplicate offset %s -> %s", path, loc);
> + else if (ret == CHE_FAIL)
> + debug(ap->logopt, MODPREFIX
> + "failed to add multi-mount offset %s -> %s", path, m_mapent);
> + else {
> + ret = CHE_OK;
> debug(ap->logopt, MODPREFIX
> "added multi-mount offset %s -> %s", path, m_mapent);
> - else
> - warn(ap->logopt, MODPREFIX
> - "syntax error or duplicate offset %s -> %s", path, loc);
> + }
>
> return ret;
> }
> @@ -1410,7 +1415,7 @@ int parse_mount(struct autofs_point *ap, const char *name,
> char buf[MAX_ERR_BUF];
> struct map_source *source;
> struct mapent_cache *mc;
> - struct mapent *me = NULL;
> + struct mapent *me;
> char *pmapent, *options;
> const char *p;
> int mapent_len, rv = 0;
> @@ -1561,33 +1566,28 @@ int parse_mount(struct autofs_point *ap, const char *name,
> strcat(m_root, name);
> }
>
> - cache_writelock(mc);
> - me = cache_lookup_distinct(mc, name);
> - if (!me) {
> - int ret;
> - /*
> - * Not in the cache, perhaps it's a program map
> - * or one that doesn't support enumeration
> - */
> - ret = cache_add(mc, source, name, mapent, time(NULL));
> - if (ret == CHE_FAIL) {
> - cache_unlock(mc);
> - free(options);
> - return 1;
> + /*
> + * Can't take the write lock for direct mount entries here
> + * but they should always be present in the map entry cache.
> + */
> + if (ap->type == LKP_INDIRECT) {
> + cache_writelock(mc);
> + me = cache_lookup_distinct(mc, name);
> + if (!me) {
> + int ret;
> + /*
> + * Not in the cache, perhaps it's a program map
> + * or one that doesn't support enumeration.
> + */
> + ret = cache_add(mc, source, name, mapent, age);
> + if (ret == CHE_FAIL) {
> + cache_unlock(mc);
> + free(options);
> + return 1;
> + }
> }
> - } else {
> - /*
> - * If the entry exists it must not have any existing
> - * multi-mount subordinate entries since we are
> - * mounting this afresh. We need to do this to allow
> - * us to fail on the check for duplicate offsets in
> - * we don't know when submounts go away.
> - */
> - cache_multi_writelock(me);
> - cache_delete_offset_list(mc, name);
> - cache_multi_unlock(me);
> + cache_unlock(mc);
> }
> - cache_unlock(mc);
>
> cache_readlock(mc);
> me = cache_lookup_distinct(mc, name);
Hello Ian,
Thanks for getting back to this issue. I've tested the patch now using the test
case reported earlier on this thread and I confirm that it resolves the deadlock
on startup.
The problem of not reconnecting to the nested mounts remains, but I don't
want to mix up the threads.
Thanks again,
Leonardo
--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[Linux ARM Kernel]
[Linux ARM]
[Linux Omap]
[Fedora ARM]
[IETF Annouce]
[Security]
[Bugtraq]
[Linux]
[Linux OMAP]
[Linux MIPS]
[ECOS]
[Tools]
[DDR & Rambus]
[Asterisk Internet PBX]
[Linux API]
[Monitors]