[Ocfs2-users] Unable to stop cluster as heartbeat region still active

quanta quanta.linux at gmail.com
Mon Dec 24 09:35:39 PST 2012


I accidentally re-formated the volume.
Is there any way to get rid of this problem without rebooting:

# mounted.ocfs2 -d
Device                FS     Stack  UUID                              Label
/dev/sdb              ocfs2  o2cb   12963EAF4E16484DB81ECB0251177C26  ocfs2_drbd1
/dev/drbd1            ocfs2  o2cb   12963EAF4E16484DB81ECB0251177C26  ocfs2_drbd1

# ls -l /sys/kernel/config/cluster/cpc/heartbeat/
drwxr-xr-x 2 root root    0 Dec 24 22:53 72EF09EA3D0D4F51BDC00B47432B1EB2

# ocfs2_hb_ctl -I -u 72EF09EA3D0D4F51BDC00B47432B1EB2
72EF09EA3D0D4F51BDC00B47432B1EB2: 7 refs

# ocfs2_hb_ctl -K -u 72EF09EA3D0D4F51BDC00B47432B1EB2
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat


On 10/19/2011 01:33, Sunil Mushran wrote:
>/  One way this can happen is if one starts the hb manually and then force
/>/  formats on that volume. The format will generate a new uuid. Once that
/>/  happens, the hb tool cannot map the region to the device and thus fail
/>/  to stop it. Right now the easiest option on this box is resetting it.
/>/
/>/  On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
/>>/  Yes, i did reformat it(even more than once i think, last week). This
/>>/  is a pre-production system and i'm trying various options before
/>>/  moving into real life.
/>>/
/>>/
/>>/  On 10/19/2011 01:19, Sunil Mushran wrote:
/>>>/  Did you reformat the volume recently? or, when did you format last?
/>>>/
/>>>/  On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:
/>>>>/  well..this is weird
/>>>>/  ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
/>>>>/  *918673F06F8F4ED188DDCE14F39945F6*  dead_threshold
/>>>>/
/>>>>/  looks like we have different UUIDs. Where is this coming from??
/>>>>/
/>>>>/  ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
/>>>>/  918673F06F8F4ED188DDCE14F39945F6: 1 refs
/>>>>/
/>>>>/
/>>>>/  On 10/19/2011 01:04, Sunil Mushran wrote:
/>>>>>/  Let's do it by hand.
/>>>>>/  rm -rf
/>>>>>/  /sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>/  *
/>>>>>/
/>>>>>/  On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
/>>>>>>/   ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>>/  ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping
/>>>>>>/  heartbeat
/>>>>>>/
/>>>>>>/  No improvment :(
/>>>>>>/
/>>>>>>/
/>>>>>>/  On 10/19/2011 00:50, Sunil Mushran wrote:
/>>>>>>>/  See if this cleans it up.
/>>>>>>>/  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>>>/
/>>>>>>>/  On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
/>>>>>>>>/  ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>>>>/  0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
/>>>>>>>>/
/>>>>>>>>/
/>>>>>>>>/  On 10/19/2011 00:43, Sunil Mushran wrote:
/>>>>>>>>>/  ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>>>>>/
/>>>>>>>>>/  On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>/  mounted.ocfs2 -d
/>>>>>>>>>>/  Device                FS     Stack
/>>>>>>>>>>/  UUID                              Label
/>>>>>>>>>>/  /dev/mapper/volgr1-lvol0  ocfs2  o2cb
/>>>>>>>>>>/  0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2
/>>>>>>>>>>/
/>>>>>>>>>>/  mounted.ocfs2 -f
/>>>>>>>>>>/  Device                FS     Nodes
/>>>>>>>>>>/  /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
/>>>>>>>>>>/
/>>>>>>>>>>/  ro02xsrv001 = the other node in the cluster.
/>>>>>>>>>>/
/>>>>>>>>>>/  By the way, there is no /dev/md-2
/>>>>>>>>>>/   ls /dev/dm-*
/>>>>>>>>>>/  /dev/dm-0  /dev/dm-1
/>>>>>>>>>>/
/>>>>>>>>>>/
/>>>>>>>>>>/  On 10/19/2011 00:37, Sunil Mushran wrote:
/>>>>>>>>>>>/  So it is not mounted. But we still have a hb thread because
/>>>>>>>>>>>/  hb could not be stopped during umount. The reason for that
/>>>>>>>>>>>/  could be the same that causes ocfs2_hb_ctl to fail.
/>>>>>>>>>>>/
/>>>>>>>>>>>/  Do:
/>>>>>>>>>>>/  mounted.ocfs2 -d
/>>>>>>>>>>>/
/>>>>>>>>>>>/  On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>>>/  ls -lR /sys/kernel/debug/ocfs2
/>>>>>>>>>>>>/  /sys/kernel/debug/ocfs2:
/>>>>>>>>>>>>/  total 0
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/  ls -lR /sys/kernel/debug/o2dlm
/>>>>>>>>>>>>/  /sys/kernel/debug/o2dlm:
/>>>>>>>>>>>>/  total 0
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/  ocfs2_hb_ctl -I -d /dev/dm-2
/>>>>>>>>>>>>/  ocfs2_hb_ctl: Device name specified was not found while
/>>>>>>>>>>>>/  reading uuid
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/  There is no /dev/dm-2 mounted.
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/  On 10/19/2011 00:27, Sunil Mushran wrote:
/>>>>>>>>>>>>>/  mount -t debugfs debugfs /sys/kernel/debug
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/  Then list that dir.
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/  Also, do:
/>>>>>>>>>>>>>/  ocfs2_hb_ctl -l -d /dev/dm-2
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/  Be careful before killing. We want to be sure that dev is
/>>>>>>>>>>>>>/  not mounted.
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/  On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>>>>>/  Again   the outputs:
/>>>>>>>>>>>>>>/   cat
/>>>>>>>>>>>>>>/  /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
/>>>>>>>>>>>>>>/  dm-2
/>>>>>>>>>>>>>>/  --->here should be volgr1-lvol0 i guess?
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/  ls -lR /sys/kernel/debug/ocfs2
/>>>>>>>>>>>>>>/  ls: /sys/kernel/debug/ocfs2: No such file or directory
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/  ls -lR /sys/kernel/debug/o2dlm
/>>>>>>>>>>>>>>/  ls: /sys/kernel/debug/o2dlm: No such file or directory
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/  I think i have to enable debug first somehow..?
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/  Laurentiu.
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/  On 10/19/2011 00:17, Sunil Mushran wrote:
/>>>>>>>>>>>>>>>/  What does this return?
/>>>>>>>>>>>>>>>/  cat
/>>>>>>>>>>>>>>>/  /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
/>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>/  Also, do:
/>>>>>>>>>>>>>>>/  ls -lR /sys/kernel/debug/ocfs2
/>>>>>>>>>>>>>>>/  ls -lR /sys/kernel/debug/o2dlm
/>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>/  On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>>>>>>>/  Here is the output:
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/  ls -lR /sys/kernel/config/cluster
/>>>>>>>>>>>>>>>>/  /sys/kernel/config/cluster:
/>>>>>>>>>>>>>>>>/  total 0
/>>>>>>>>>>>>>>>>/  drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/  /sys/kernel/config/cluster/CLUSTER:
/>>>>>>>>>>>>>>>>/  total 0
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
/>>>>>>>>>>>>>>>>/  drwxr-xr-x 3 root root    0 Oct 19 00:12 heartbeat
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12
/>>>>>>>>>>>>>>>>/  keepalive_delay_ms
/>>>>>>>>>>>>>>>>/  drwxr-xr-x 4 root root    0 Oct 11 20:23 node
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12
/>>>>>>>>>>>>>>>>/  reconnect_delay_ms
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/  /sys/kernel/config/cluster/CLUSTER/heartbeat:
/>>>>>>>>>>>>>>>>/  total 0
/>>>>>>>>>>>>>>>>/  drwxr-xr-x 2 root root    0 Oct 19 00:12
/>>>>>>>>>>>>>>>>/  918673F06F8F4ED188DDCE14F39945F6
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/  /sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*:
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/  total 0
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
/>>>>>>>>>>>>>>>>/  -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/  /sys/kernel/config/cluster/CLUSTER/node:
/>>>>>>>>>>>>>>>>/  total 0
/>>>>>>>>>>>>>>>>/  drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
/>>>>>>>>>>>>>>>>/  drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/  /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
/>>>>>>>>>>>>>>>>/  total 0
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/  /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
/>>>>>>>>>>>>>>>>/  total 0
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
/>>>>>>>>>>>>>>>>/  -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/  On 10/19/2011 00:12, Sunil Mushran wrote:
/>>>>>>>>>>>>>>>>>/  ls -lR /sys/kernel/config/cluster
/>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>/  What does this return?
/>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>/  On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>>>>>>>>>/  Hi,
/>>>>>>>>>>>>>>>>>>/  I have a 2 nodes ocfs2 cluster running UEK
/>>>>>>>>>>>>>>>>>>/  2.6.32-100.0.19.el5,
/>>>>>>>>>>>>>>>>>>/  ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
/>>>>>>>>>>>>>>>>>>/  My problem is that all the time when i try to run
/>>>>>>>>>>>>>>>>>>/  /etc/init.d/o2cb stop
/>>>>>>>>>>>>>>>>>>/  it fails with this error:
/>>>>>>>>>>>>>>>>>>/        Stopping O2CB cluster CLUSTER: Failed
/>>>>>>>>>>>>>>>>>>/        Unable to stop cluster as heartbeat region
/>>>>>>>>>>>>>>>>>>/  still active
/>>>>>>>>>>>>>>>>>>/  There is no active mount point. I tried to manually
/>>>>>>>>>>>>>>>>>>/  stop the heartdbeat
/>>>>>>>>>>>>>>>>>>/  with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0
/>>>>>>>>>>>>>>>>>>/  ocfs2" (after finding
/>>>>>>>>>>>>>>>>>>/  the refs number with "ocfs2_hb_ctl -I -d
/>>>>>>>>>>>>>>>>>>/  /dev/mapper/volgr1-lvol0 ").
/>>>>>>>>>>>>>>>>>>/  But even if refs number is set to zero the "heartbeat
/>>>>>>>>>>>>>>>>>>/  region still
/>>>>>>>>>>>>>>>>>>/  active" occurs.
/>>>>>>>>>>>>>>>>>>/  How can i fix this?
/>>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>>/  Thank you in advance.
/>>>>>>>>>>>>>>>>>>/  Laurentiu.
/>>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>>/  _______________________________________________
/>>>>>>>>>>>>>>>>>>/  Ocfs2-users mailing list
/>>>>>>>>>>>>>>>>>>/  Ocfs2-users at oss.oracle.com  <http://oss.oracle.com/mailman/listinfo/ocfs2-users>
/>>>>>>>>>>>>>>>>>>/  http://oss.oracle.com/mailman/listinfo/ocfs2-users
/>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>/
/>>>>>>>>>>>/
/>>>>>>>>>>/
/>>>>>>>>>/
/>>>>>>>>/
/>>>>>>>/
/>>>>>>/
/>>>>>/
/>>>>/
/>>>/
/>>/
/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121225/82d37a9e/attachment-0001.html 


More information about the Ocfs2-users mailing list