- Subject: Re: Linux NFSv4 client uses returned delegation in subsequent READ resulting in hang (BAD_STATEID)
- From: Chuck Lever <chuck.lever@xxxxxxxxxx>
- Date: Mon, 2 Jul 2012 10:09:50 -0400
- Cc: linux-nfs@xxxxxxxxxxxxxxx
- In-reply-to: <CAPD_G3L0wwzCWyoPJp_sKpuQ0J7+avn-ejEn-WnXf81zr=-LqA@mail.gmail.com>
- References: <CAPD_G3L0wwzCWyoPJp_sKpuQ0J7+avn-ejEn-WnXf81zr=-LqA@mail.gmail.com>
On Jun 30, 2012, at 9:53 PM, Charles 'Boyo wrote:
> Hello.
>
> I have repeatedly had Linux NFS clients hang while trying to access
> files on a NFSv4 mount (Solaris).
> Investigations revealed that this client is using a delegation that it
> has already returned, resulting in the BAD_STATEID error.
> Unfortunately, it then proceeds to hammer the server with these
> "doomed" requests, resulting in the client-side unresponsiveness and
> constant network traffic.
>
> A sample trace can be found at http://pastebin.centos.org/39046
> As shown, the READ in frame 10 (line 112) follows the DELEGRETURN in
> frame 9 which results in the error. This READ was then repeated
> infinitely until either the server or client was restarted.
> Disabling delegations on the server-side caused the problem to cease.
> So what is wrong with delegations on the client-side?
Usually we see this behavior because of a race between an OPEN with delegation and a delegation recall. In this case, however, the client is actively returning a READ delegation, then proceeding to use it anyway. I don't see the server's recall callback, though, and there are other indications that this trace is not complete. So it's hard to be 100% confident.
As far as I know, the EL6.2 client does not have support for recovering a single bad STATEID, which is why it is looping. That support is available in mainline kernels 3.0 and later.
However, it seems to me that it is a bug for the client to continue using a delegation that it has returned.
You have already found one work-around: disable delegations on the NFS server. Or you could mount with NFSv3. Or, if feasible, your application could be modified to use fcntl() locking.
> I am using the latest nfs-utils packages and my mount options are as
> shown below:
>
> # cat /etc/redhat-release
> CentOS release 6.2 (Final)
>
> # uname -r
> 2.6.32-220.4.1.el6.x86_64
>
> # rpm -qa '*nfs*'
> nfs-utils-lib-1.1.5-4.el6.x86_64
> nfs-utils-1.2.3-15.el6_2.1.x86_64
>
> # grep nfs4 /proc/mounts
> 10.51.1.6:/SharedFolder/ /var/LocalMountPoint nfs4
> rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.51.1.34,minorversion=0,local_lock=none,addr=10.51.1.6
> 0 0
>
> Regards,
>
> Charles
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[Linux USB Development]
[Linux Media Development]
[Video for Linux]
[Linux NILFS]
[Linux Audio Users]
[Photo]
[Yosemite Info]
[Yosemite Photos]
[POF Sucks]
[Linux Kernel]
[Linux SCSI]
[XFree86]