Cluster failure, dlm overload
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
First of all, thanks for your time.
A five node
cluster that is sharing several GFS filesystem is having total blocks
of filesystem activity. Around one block each week. These blocks
appeared several weeks ago, after more than three years in service.
Cluster is restored after restart of all cluster nodes ;-)
When these blocks appears, we can see dlm send and receive process
with a high level of CPU consumption, network traffic is a also ten
times the normal one.
A capture (wireshark) of network traffic in
DLM port shows thousand of messages per second. In particular, all
"request message" are replied with a "request reply" where errno=EBADR,
Lookup messages seems ok.
The cluster is with a software version a few outdated, the one of RedHat 2.6.18, but not possible to upgrade easily.
Any suggestion is welcome.
Linux-cluster mailing list
[Corosync Cluster Engine]
[Fedora Legacy List]
[Big List of Linux Books]