Hi, In between I tested with 2.6.38rc6 - no hangs there, but extreme slowness (copying with ~2MB/s) and periodic zero activity (up to 3 minutes) with programs trying to write to the btrfs. Since I saw very high CPU utilization in the raid6 (md) code I suspect a problem there. However, because that behavior didn't seem acceptable as well, I patched a 2.6.37.3 vanilla kernel with the latest btrfs-unstable. The performance was back, but it took ~16 hours until the lockup occurred, the btrfs is inaccessible again. The usage scenario right at that point was 4 threads writing to the btrfs via NFS with ~2MB/s each. This time, btrfs-transac itself went into D state, same with all the nfsd and a "touch" I placed to verify the btrfs lockup. Attached a dmesg of sysrq-t. Does anyone have any ideas how to debug this - timeout detection, in-memory data structure dumps, etc? Regards, Christian On 02/23/2011 11:40 AM, Christian Schmidt wrote: > Hi, > > After a few weeks of testing and preparation I commissioned a new NFS > server with btrfs for the main storage. I ran into two situations where > the btrfs locked up and I had to hard reboot the machine (sysrq-b). > I end up with btrfs-transac in state D, waiting for the pending > transaction to be completed if I interpret the code right. On top of > that all eight nfsds are in state D waiting to start several different > transactions. > I have attached the sysrq-t output after I killed all processes I could > before rebooting. > > It only seems to happen with somewhat heavier IO load, in this case one > process md5summing large files (a few TB in total) while another process > tries to write to the NFS share. I never saw it e.g. while copying > single files onto the file system or reading multiple files. > > I'll be glad for any hints and recommendations. > > Christian >
Attachment:
dmesg.201103131826.bz2
Description: BZip2 compressed data
