|
|
|
Re: Kernel 3.4.X NFS server regression | |
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
|
|
On 06/11/2012 03:39 PM, Jeff Layton wrote: > On Mon, 11 Jun 2012 08:16:34 -0400 > bfields <bfields@xxxxxxxxxxxx> wrote: > >> On Sun, Jun 10, 2012 at 03:00:42PM +0000, Myklebust, Trond wrote: >>> Cc: linux-nfs@xxxxxxxxxxxxxxx + bfields and changing title to label it >>> as a server regression since that is what the trace appears to imply. >>> >>> On Sun, 2012-06-10 at 12:56 +0200, Joerg Platte wrote: >>>> All 3.4 kernels I tried so far (3.4, 3.4.1 and 3.4.2) suffer from the >>>> same NFS related problem: >>>> >>>> Jun 10 09:23:36 coco kernel: INFO: task kworker/u:1:8 blocked for more >>>> than 120 seconds. >>>> Jun 10 09:23:36 coco kernel: "echo 0 > >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> Jun 10 09:23:36 coco kernel: kworker/u:1 D 002ba28c 0 8 >>>> 2 0x00000000 >>>> Jun 10 09:23:36 coco kernel: df465ec0 00000046 00000005 002ba28c >>>> 00000000 0000000a 00000282 df465e70 >>>> Jun 10 09:23:36 coco kernel: df465ec0 df44d2b0 ffff6b60 df465e84 >>>> df44d2b0 e33fa6b3 00000282 de764ae0 >>>> Jun 10 09:23:36 coco kernel: ffffffff d78bcfb8 df465e8c c012e0f6 >>>> df465ea4 c013610c 00000000 d78bcf80 >>>> Jun 10 09:23:36 coco kernel: Call Trace: >>>> Jun 10 09:23:36 coco kernel: [<c012e0f6>] ? add_timer+0x11/0x17 >>>> Jun 10 09:23:36 coco kernel: [<c013610c>] ? queue_delayed_work_on+0x74/0xf0 >>>> Jun 10 09:23:36 coco kernel: [<c0136ba4>] ? queue_delayed_work+0x1b/0x28 >>>> Jun 10 09:23:36 coco kernel: [<c0350f5b>] schedule+0x1d/0x4c >>>> Jun 10 09:23:36 coco kernel: [<e0cda5f1>] cld_pipe_upcall+0x4e/0x75 [nfsd] >>>> Jun 10 09:23:36 coco kernel: [<e0cda678>] >>>> nfsd4_cld_grace_done+0x60/0x99 [nfsd] >>>> Jun 10 09:23:36 coco kernel: [<e0cd9cb5>] >>>> nfsd4_record_grace_done+0x10/0x12 [nfsd] >>>> Jun 10 09:23:36 coco kernel: [<e0cd6696>] laundromat_main+0x291/0x2d8 >>>> [nfsd] >>>> Jun 10 09:23:36 coco kernel: [<c0136d2f>] process_one_work+0xff/0x325 >>>> Jun 10 09:23:36 coco kernel: [<c0134bec>] ? start_worker+0x20/0x23 >>>> Jun 10 09:23:36 coco kernel: [<e0cd6405>] ? >>>> nfsd4_process_open1+0x32b/0x32b [nfsd] >>>> Jun 10 09:23:36 coco kernel: [<c013727a>] worker_thread+0xf4/0x39a >>>> Jun 10 09:23:36 coco kernel: [<c0137186>] ? rescuer_thread+0x231/0x231 >>>> Jun 10 09:23:36 coco kernel: [<c013a556>] kthread+0x6c/0x6e >>>> Jun 10 09:23:36 coco kernel: [<c013a4ea>] ? kthreadd+0xe8/0xe8 >>>> Jun 10 09:23:36 coco kernel: [<c035263e>] kernel_thread_helper+0x6/0xd >>>> >>>> A kworker task is stuck in D state and nfs mounts from other clients do >>>> not work at all. This happens only on one machine, another one with the >>>> same kernel (same self compiled Debian package) works. All previous 3.3 >>>> kernels work as well. >>>> >>>> Since this machine is remote it is not that easy to bisect to find the >>>> bad commit. Are there any other things I can try? >> >> If you create a directory named /var/lib/nfs/v4recovery/, does the >> problem go away? >> >> My guess would be that it's trying to upcall to the new reboot-recovery >> state daemon, and you don't have that running. >> >> Before the addition of that upcall state was kept in >> /var/lib/nfs/v4recovery. So we decide whether to use the old method or >> the new one by checking for the existance of that path. >> >> But I'm guessing we were wrong to assume that existing setups that >> people perceived as working would have that path, because the failures >> in the absence of that path were probably less obvious. >> >> --b. > > This sounds like the same problem that Hans reported as well. I've not > been able to reproduce that so far. Here's what I get when I start nfsd > with no v4recoverdir and nfsdcld isn't running: > > [ 109.715080] NFSD: starting 90-second grace period > [ 229.984220] NFSD: Unable to end grace period: -110 > > What I don't quite understand is why the queue_timeout job isn't > getting run here. What should happen is that 30s after upcall, > rpc_timeout_upcall_queue should run. The message will be found to be > still sitting on the , so it should set its status to -ETIMEDOUT > and wake up the caller. > > I can only assume that the queue_timeout job isn't getting run for some > reason, but I'm unclear on why that would be. > Regression fixing aside. I would consider changing the all mechanism to a call_usermodehelper mechanism. Not only it cuts the in-kernel code to 1/3, it also cuts user-mode code to 1/3. And specially it relives you of any special daemon setup dependency. All you do is run an app/script that does what it does when it does it, directly without anyone waiting and/or any kind of handshake. It is easy to pass any kind of parameters from Kernel to user-mode. Passing info from user-mode to Kernel is also easy by setting up a sysfs connection point. And most important there are no timeouts in the new-kernel vs old user-mode. If the script/app does not exists the call_usermodehelper returns immediately and the old behavior can be used. And lastly if persistent performance is an issue in the steady state. (since calling call_usermodehelper in the hot path can be slow at times) Then I would consider that the init script ran at startup via call_usermodehelper then sets up a faster communication channel like a udev even and/or some other event mechanism. In any way the old dual local-RPC channel has proved to be a pain in the ass. (BTW: if you attempt it you will see that so many lines of code where eliminated you might consider it for a Regression fixing to @stable) Thank to Steve Dickson who suggested this wonderful idea when I had the same exact problem. I'm just repeating his suggestion, and in light of the experience of implementing both methods. Just my $0.017 Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Other Archives] [Linux Kernel Newbies] [Linux Driver Development] [Fedora Kernel] [Linux Kernel Testers] [Linux SH] [Linux Omap] [Linux Kbuild] [Linux Tape] [Linux Input] [Linux Kernel Janitors] [Linux Kernel Packagers] [Linux Doc] [Linux Man Pages] [Linux API] [Linux Memory Management] [Linux Modules] [Linux Standards] [Kernel Announce] [Netdev] [Git] [Linux PCI] Linux CAN Development [Linux I2C] [Linux RDMA] [Linux NUMA] [Netfilter] [Netfilter Devel] [SELinux] [Bugtraq] [FIO] [Linux Perf Users] [Linux Serial] [Linux PPP] [Linux ISDN] [Linux Next] [Kernel Stable Commits] [Linux Tip Commits] [Kernel MM Commits] [Linux Security Module] [AutoFS] [Filesystem Development] [Ext3 Filesystem] [Linux bcache] [Ext4 Filesystem] [Linux BTRFS] [Linux CEPH Filesystem] [Linux XFS] [XFS] [Linux NFS] [Linux CIFS] [Ecryptfs] [Linux NILFS] [Linux Cachefs] [Reiser FS] [Initramfs] [Linux FB Devel] [Linux OpenGL] [DRI Devel] [Fastboot] [Linux RT Users] [Linux RT Stable] [eCos] [Corosync] [Linux Clusters] [LVS Devel] [Hot Plug] [Linux Virtualization] [KVM] [KVM PPC] [KVM ia64] [Linux Containers] [Linux Hexagon] [Linux Cgroups] [Util Linux] [Wireless] [Linux Bluetooth] [Bluez Devel] [Ethernet Bridging] [Embedded Linux] [Barebox] [Linux MMC] [Linux IIO] [Sparse] [Smatch] [Linux Arch] [x86 Platform Driver] [Linux ACPI] [Linux IBM ACPI] [LM Sensors] [CPU Freq] [Linux Power Management] [Linmodems] [Linux DCCP] [Linux SCTP] [ALSA Devel] [Linux USB] [Linux PA RISC] [Linux Samsung SOC] [MIPS Linux] [IBM S/390 Linux] [ARM Linux] [ARM Kernel] [ARM MSM] [Tegra Devel] [Sparc Linux] [Linux Security] [Linux Sound] [Linux Media] [Video 4 Linux] [Linux IRDA Users] [Linux for the blind] [Linux RAID] [Linux ATA RAID] [Device Mapper] [Linux SCSI] [SCSI Target Devel] [Linux SCSI Target Infrastructure] [Linux IDE] [Linux SMP] [Linux AXP] [Linux Alpha] [Linux M68K] [Linux ia64] [Linux 8086] [Linux x86_64] [Linux Config] [Linux Apps] [Linux MSDOS] [Linux X.25] [Linux Crypto] [DM Crypt] [Linux Trace Users] [Linux Btrace] [Linux Watchdog] [Utrace Devel] [Linux C Programming] [Linux Assembly] [Dash] [DWARVES] [Hail Devel] [Linux Kernel Debugger] [Linux gcc] [Gcc Help] [X.Org] [Wine]
![]() |
![]() |
[Older Kernel Discussion] [Yosemite National Park Forum] [Large Format Photos] [Gimp] [Yosemite Photos] [Stuff]