[Ocfs2-users] o2cb stack and kernel >= 2.6.37

Sunil Mushran sunil.mushran at oracle.com
Fri Apr 8 09:24:05 PDT 2011


On 04/08/2011 08:01 AM, Werner Flamme wrote:
> Joel, I think I encountered otherwise :-(
>
> In our OCFS2 cluster there are up to 15 active nodes. 7 of them were
> running yesterday (2 with Oracle Linux, 5 with SLES 11 SP1 + SLE HAE
> SP1). When applying the last patches of the SLE HAE, the patched nodes
> talked dlm 1.1 and did not fallback. They silently unmunted three
> volumes, the fourth volume stayed connected (there was no data on it).
> In the logs, I found entries like
>
> dlm_query_join_proto_check:734 Node 5 wanted to join with DLM locking
> protocol 1.0, but we have 1.1, disallowing
> o2net: connection to node vocnod9 (num 5) at 141.65.171.51:7777
> shutdown, state 8
> o2net: no longer connected to node vocnod9 (num 5) at 141.65.171.51:7777
>
> and node 5 lost access to at least one volume straight after this.
>
> Maybe a specialty of SUSE, I don't know. I could not get the nodes to
> communicate with the rest of the cluster, I had to undo all of the
> patches provided from HAE SP1-update repo and to reboot before it worked
> again. Maybe the libdlm3 package was the culprit. I opened a support
> case at Novell and will report back what they say...

Yes, this is a bug and we have a fix for this headed to mainline.
http://oss.oracle.com/pipermail/ocfs2-devel/2011-April/007996.html



More information about the Ocfs2-users mailing list