[Ocfs2-users] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)
Mike Reid
mbreid at thepei.com
Thu Sep 15 13:59:35 PDT 2011
Interesting observation. Thank you, Sunil.
I should note that I could not figure out how to perform a stack trace from
within Pacemaker directly, so I waiting for Pacemaker to start
O2CB/OCFS2/DLM and tried manually to mount to get the trace.
I¹ve noticed that as soon as it fails (via Pacemaker) the DRBD Primary
device gets demoted to Secondary...I wonder if perhaps attempt was possibly
too late and the /dev/drbd0 perhaps already in Secondary state? ...it seems
likely in order to satisfy the first condition: if( mdev->state.role !=
R_PRIMARY ) { ...
I wonder what else I could try (manually or via pacemaker) to help determine
what may be at fault here?
Normally I can set a node to standby, and then back online with no
issues...but somehow this node will no longer join, even after rebooting
both nodes in the cluster, etc..
From: Sunil Mushran <sunil.mushran at oracle.com>
Date: Thu, 15 Sep 2011 13:42:54 -0700
To: Mike Reid <mbreid at thepei.com>
Cc: <ocfs2-users at oss.oracle.com>
Subject: Re: [Ocfs2-users] Trouble getting node to re-join two node cluster
(OCFS2/DRBD Primary/Primary)
open("/dev/drbd0", O_RDONLY|O_DIRECT) = -1 EMEDIUMTYPE (Wrong medium
type)
drbd_open()
...
if (mdev->state.role != R_PRIMARY) {
if (mode & FMODE_WRITE)
rv = -EROFS;
else if (!allow_oos)
rv = -EMEDIUMTYPE;
}
...
So the failure appears to be emanating from drbd. There seems
to be a allow_oos module param that is not 0. I have no idea
what this param does. Also, am reading current mainline. 2.6.35 may
be different.
On 09/15/2011 01:26 PM, Mike Reid wrote:
>
> Hello all,
>
> ** I have also posted this in the pacemaker list, but I have a feeling it's
> more OCFS2 specific **
>
> We have a two-node cluster still in development that has been running fine
> for weeks (little to no traffic). I made some updates to our CIB recently,
> and everything seemed just fine.
>
> Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and once it
> was complete one of the nodes had become completely disconnected and I
> haven't been able to reconnect since.
>
> DRBD is working fine, everything is UpToDate and I can get both nodes in
> Primary/Primary, but when it comes down to starting OCFS2 and mounting the
> volume, I'm left with:
>
>
>>
>> resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): unknown error
>>
>
>
> I am using "pcmk" as the cluster_stack, and letting Pacemaker control
> everything...
>
> The last time this happened the only way I was able to resolve it was to
> reformat the device (via mkfs.ocfs2 -F). I don't think I should have to do
> this, underlying blocks seem fine, and one of the nodes is running just
> fine. The (currently) unmounted node is staying in sync as far as DRBD is
> concerned.
>
> Here's some detail that hopefully will help, please let me know if there's
> anything else I can provide to help know the best way to get this node back
> "online":
>
>
> Ubuntu 10.10 / Kernel 2.6.35
>
> Pacemaker 1.0.9.1
> Corosync 1.2.1
> Cluster Agents 1.0.3 (Heartbeat)
> Cluster Glue 1.0.6
> OpenAIS 1.1.2
>
> DRBD 8.3.10
> OCFS2 1.5.0
>
> cat /sys/fs/ocfs2/cluster_stack = pcmk
>
> node1: mounted.ocfs2 -d
>
> Device FS UUID Label
> /dev/sda3 ocfs2 fe4273e1-f866-4541-bbcf-66c5dfd496d6
>
> node2: mounted.ocfs2 -d
>
> Device FS UUID Label
> /dev/sda3 ocfs2 d6f7cc6d-21d1-46d3-9792-bc650736a5ef
> /dev/drbd0 ocfs2 d6f7cc6d-21d1-46d3-9792-bc650736a5ef
>
> * NOTES:
> - Both nodes are identical, in fact one node is a direct mirror (hdd clone)
> - I have attached the CIB (crm configure edit contents) and mount trace
>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110915/5ce332be/attachment-0001.html
More information about the Ocfs2-users
mailing list