[Ocfs2-users] Node crash, recovery problem

Jeff Bachtel jeff at cepheid.org
Thu Aug 2 07:46:31 PDT 2007


While rebooting a node last night, we had a umount failure on an ocfs2
filesystem (it's the filesystem that is used by Xen, and xend
frequently ends up locking the fs open in a kernel thread). We power
cycled the machine, and ocfs2 mounting failed on subsequent reboot,
due to another node blocking it:

Aug  1 18:16:36 vpr-app-01 kernel: (7111,0):dlm_query_join_handler:633 node 11 trying to joi
n, but it still needs recovery.

The annoying solution for us was to eventually reboot the node that
was blocking, as well. As this has happened to us in the past, is
there a utility to force the recovery needed, on either the blocking
or blocked node? Or to remove the node from the recovery list?

This is ocfs2 1.2.2 (patch 11) running on OpenSUSE 10.2. I would be
ecstatic to move to a higher revision, unfortunately I don't know when
the suse package maintainer will do so.

thanks,

Jeff



More information about the Ocfs2-users mailing list