* Daniel J Blueman (daniel.blueman@xxxxxxxxx) wrote: > I was experiencing a similar pattern of ESTALE issues with NFS with > 2.6.33 (IIRC) and cached data on ext4, and could reproduce it from > time to time performing kernel rebuilds over NFS. > > I've CC'd Trond on the full email to see if it rings a bell. The best > outcome may be if we write a micro-reproducer which exploits this race > using cached data. I've recently seen quite a concrete case, which may be interesting: NB, this is not an exact transcript ## step1: build a binary to use (out of tree build, touches: depends ## files, object files, binaries) vcfe:some/dir/bin$ make ## step2: launch a job on the cluster that uses the binaries in dir/bin ## but does not touch any other files in dir/bin vcfe:some/dir$ sbatch -N4 my_job.sh ## step3: let time pass (job completed, came back next day) ## vcfe:some/dir$ ls -l bin < many stale filehandle errors > In actual fact, steps 1 and 2 were repeated several times (happened to be bisecting something) with out issue, then the following day step 3 revealed a problem. Now all writes to dir/bin occurred on vcfe, other computers only accessed it for the binary. Other computers will have created extra directories in "some/dir/". stale filehandle errors were resolved by: echo 2 > /proc/sys/vm/drop_caches A quick summary of the setup: - nfs client was 2.6.35, mounting with nfsv3 - nfs server was 2.6.33, exporting a btrfs filesystem (noatime,nodiratime) I'd be very interested if anyone has any further thoughts on the issue. Kind regards, ..david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
