[Ocfs2-users] OCFS2 Fencing, then panic

Eli Criffield elicriffield at gmail.com
Thu Apr 5 13:22:15 PDT 2007


Yep they have a new kernel that works

Thanks

Eli

On 4/3/07, Sunil Mushran <sunil.mushran at oracle.com> wrote:
> This is a known issue on SLES10. Ping Novell for the update.
>
> Eli Criffield wrote:
> > Whenever i mount my shared ocfs2 volume on the second node, the
> > primary kernel panics.
> > I have SLES10 xen guests both able to access the same /dev/sdc1.
> >
> > My /etc/ocfs2/cluster.conf --------
> >
> > cluster:
> >        node_count = 2
> >        name = ocfs2
> > node:
> >        ip_port = 7777
> >        ip_address = 10.24.1.65
> >        number = 1
> >        name = testnode1
> >        cluster = ocfs2
> > node:
> >        ip_port = 7777
> >        ip_address = 10.24.1.63
> >        number = 0
> >        name = testnode0
> >        cluster = ocfs2
> >
> > testnode0:~ # /etc/init.d/o2cb status
> > Module "configfs": Loaded
> > Filesystem "configfs": Mounted
> > Module "ocfs2_nodemanager": Loaded
> > Module "ocfs2_dlm": Loaded
> > Module "ocfs2_dlmfs": Loaded
> > Filesystem "ocfs2_dlmfs": Mounted
> > Checking O2CB cluster ocfs2: Online
> > Checking O2CB heartbeat: Active
> >
> > testnode0::~ # df |grep sdc1
> > /dev/sdc1              2097152    137108   1960044   7% /mnt/ocfs2
> >
> > testnode1:~ # /etc/init.d/o2cb status
> > Module "configfs": Loaded
> > Filesystem "configfs": Mounted
> > Module "ocfs2_nodemanager": Loaded
> > Module "ocfs2_dlm": Loaded
> > Module "ocfs2_dlmfs": Loaded
> > Filesystem "ocfs2_dlmfs": Mounted
> > Checking O2CB cluster ocfs2: Online
> > Checking O2CB heartbeat: Not active
> >
> > testnode1:~ # df |grep sdc1
> > (Not mounted)
> >
> >
> > Then i try to mount the device on testnode1:
> >
> > testnode1:~ # mount -tocfs2 /dev/sdc1 /mnt/ocfs2/
> >
> > It comes back ok, but in about a min its very sorry about fencing this
> > system by panicing.
> >
> > This is what it shows in the logs
> > --- testnode1 /var/log/messages
> > Apr  3 11:07:41 testnode1 kernel: o2net: connected to node testnode0
> > (num 0) at 10.24.1.63:7777
> > Apr  3 11:07:41 testnode1 kernel: klogd 1.4.1, ---------- state change
> > ----------
> > Apr  3 11:07:45 testnode1 kernel: OCFS2 1.2.3-SLES Thu Aug 17 11:38:33
> > PDT 2006 (build sles)
> > Apr  3 11:07:45 testnode1 kernel: ocfs2_dlm: Nodes in domain
> > ("F59ECDE2D42642F18D728F2AB96C3291"): 0 1
> > Apr  3 11:07:45 testnode1 kernel: (13756,0):ocfs2_find_slot:261 slot 0
> > is already allocated to this node!
> > Apr  3 11:07:45 testnode1 kernel: (13756,0):ocfs2_check_volume:1651
> > File system was not unmounted cleanly, recovering volume.
> > Apr  3 11:07:45 testnode1 kernel: (fs/jbd/recovery.c, 255):
> > journal_recover: JBD: recovery, exit status 0, recovered transactions
> > 3 to 4
> > Apr  3 11:07:45 testnode1 kernel: (fs/jbd/recovery.c, 257):
> > journal_recover: JBD: Replayed 0 and revoked 0/0 blocks
> > Apr  3 11:07:45 testnode1 kernel: kjournald starting.  Commit interval
> > 5 seconds
> > Apr  3 11:07:45 testnode1 kernel: ocfs2: Mounting device (8,33) on
> > (node 1, slot 0)
> > Apr  3 11:07:51 testnode1 kernel: o2net: no longer connected to node
> > testnode0 (num 0) at 10.24.1.63:7777
> > Apr  3 11:08:23 testnode1 syslog-ng[1614]: Changing permissions on
> > special file /dev/xconsole
> > Apr  3 11:08:23 testnode1 syslog-ng[1614]: Changing permissions on
> > special file /dev/tty10
> > Apr  3 11:08:23 testnode1 kernel: (13773,0):dlm_do_master_request:1330
> > ERROR: link to 0 went down!
> > Apr  3 11:08:23 testnode1 kernel: (13773,0):dlm_get_lock_resource:914
> > ERROR: status = -107
> > Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
> > system by panicing
> >
> >
> >
> > ---testnode0 /var/log/messages
> > Apr  3 11:07:41 testnode0 kernel: o2net: accepted connection from node
> > testnode1 (num 1) at 10.24.1.65:7777
> > Apr  3 11:07:41 testnode0 kernel: klogd 1.4.1, ---------- state change
> > ----------
> > Apr  3 11:07:45 testnode0 kernel: ocfs2_dlm: Node 1 joins domain
> > F59ECDE2D42642F18D728F2AB96C3291
> > Apr  3 11:07:45 testnode0 kernel: ocfs2_dlm: Nodes in domain
> > ("F59ECDE2D42642F18D728F2AB96C3291"): 0 1
> > Apr  3 11:07:51 testnode0 kernel: o2net: connection to node testnode1
> > (num 1) at 10.24.1.65:7777 has been idle for 10 seconds, shutting it
> > down.
> > Apr  3 11:07:51 testnode0 kernel: (0,0):o2net_idle_timer:1314 here are
> > some times that might help debug the situation: (tmr 1175616461.855528
> > now 1175616471.854226 dr 1175616466.855354 adv
> > 1175616466.855376:1175616466.855377 func (9fb0e5b8:502)
> > 1175616466.35269:1175616466.35272)
> > Apr  3 11:07:51 testnode0 kernel: o2net: no longer connected to node
> > testnode1 (num 1) at 10.24.1.65:7777
> >
> >
> >
> > Everything appears to be configured correctly from what i can tell,
> > But why does it connect then just disconnect?
> >
> > Eli
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
>



More information about the Ocfs2-users mailing list