[Ocfs2-users] dlm_get_lock_resource/dlm_query_join_handler errors
Mark Rife
markrife at hotmail.com
Wed Jul 19 11:49:45 CDT 2006
Hello,
I am running a 4-node ocfs2 cluster.
Our servers are running Redhat AS4, kernel 2.6.9-34.0.1.ELhugemem.
Our ocfs2 package versions are:
# rpm -qa | grep ocfs2
ocfs2-tools-debuginfo-1.2.1-1
ocfs2-tools-1.2.1-1
ocfs2-2.6.9-34.0.1.ELhugemem-1.2.2-1
ocfs2console-1.2.1-1
One of the nodes (#3) crashed. Were rebooted node 3, but now it hangs as it
tries to rejoin the cluster.
On two of the nodes that are up (0 and 1), I am getting messages repeated
/var/log/messages that look like this:
Jul 19 09:39:40 radon6 kernel: (3994,2):dlm_query_join_handler:614 node 3
trying to join, but recovery is ongoing.
Jul 19 09:39:50 radon6 last message repeated 25 times
Jul 19 09:39:51 radon6 kernel: (27704,1):dlm_get_lock_resource:895
46A341FD43114DE4A10E7D63C5099461:M0000000000000000667f6c991b8fc9: at least
one node (3) torecover before lock mastery can begin
Jul 19 09:39:51 radon6 kernel: (3994,2):dlm_query_join_handler:614 node 3
trying to join, but recovery is ongoing.
Jul 19 09:39:51 radon6 kernel: (10183,1):dlm_get_lock_resource:895
46A341FD43114DE4A10E7D63C5099461:M00000000000000000081e17e89ae74: at least
one node (3) torecover before lock mastery can begin
Jul 19 09:39:51 radon6 kernel: (3994,2):dlm_query_join_handler:614 node 3
trying to join, but recovery is ongoing.
This appears to be in an infinite loop and node 3 never starts.
Im not seeing the messages on node 2.
The cluster is up and running on 3 of the 4 servers, but I need to get all 4
nodes running again.
Can anyone provide any insight on what is going on or how this should be
handled?
Thanks!
Mark Rife
Oracle Applications DBA
markrife at hotmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060719/e96ce6ae/attachment.html
More information about the Ocfs2-users
mailing list