Cluster failure, dlm overload

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


First of all, thanks for your time.

A five node cluster that is sharing several GFS filesystem is having total blocks of filesystem activity. Around one block each week. These blocks appeared several weeks ago, after more than three years in service. Cluster is restored after restart of all cluster nodes ;-)

When these blocks appears, we can see dlm send and receive process with a high level of CPU consumption, network traffic is a also ten times the normal one.

A capture (wireshark) of network traffic in DLM port shows thousand of messages per second. In particular, all "request message" are replied with a "request reply" where errno=EBADR, Lookup messages seems ok.

The cluster is with a software version a few outdated, the one of RedHat 2.6.18, but not possible to upgrade easily.

Any suggestion is welcome.

Kind regards,
Linux-cluster mailing list

[Corosync Cluster Engine]     [Linux RAID]     [Fedora Users]     [Fedora Legacy List]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [Yosemite Photos]     [KDE Users]

Add to Google Powered by Linux