[Ocfs2-users] Unable to stop cluster as heartbeat region still active
quanta
quanta.linux at gmail.com
Mon Dec 24 09:35:39 PST 2012
I accidentally re-formated the volume.
Is there any way to get rid of this problem without rebooting:
# mounted.ocfs2 -d
Device FS Stack UUID Label
/dev/sdb ocfs2 o2cb 12963EAF4E16484DB81ECB0251177C26 ocfs2_drbd1
/dev/drbd1 ocfs2 o2cb 12963EAF4E16484DB81ECB0251177C26 ocfs2_drbd1
# ls -l /sys/kernel/config/cluster/cpc/heartbeat/
drwxr-xr-x 2 root root 0 Dec 24 22:53 72EF09EA3D0D4F51BDC00B47432B1EB2
# ocfs2_hb_ctl -I -u 72EF09EA3D0D4F51BDC00B47432B1EB2
72EF09EA3D0D4F51BDC00B47432B1EB2: 7 refs
# ocfs2_hb_ctl -K -u 72EF09EA3D0D4F51BDC00B47432B1EB2
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
On 10/19/2011 01:33, Sunil Mushran wrote:
>/ One way this can happen is if one starts the hb manually and then force
/>/ formats on that volume. The format will generate a new uuid. Once that
/>/ happens, the hb tool cannot map the region to the device and thus fail
/>/ to stop it. Right now the easiest option on this box is resetting it.
/>/
/>/ On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
/>>/ Yes, i did reformat it(even more than once i think, last week). This
/>>/ is a pre-production system and i'm trying various options before
/>>/ moving into real life.
/>>/
/>>/
/>>/ On 10/19/2011 01:19, Sunil Mushran wrote:
/>>>/ Did you reformat the volume recently? or, when did you format last?
/>>>/
/>>>/ On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:
/>>>>/ well..this is weird
/>>>>/ ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
/>>>>/ *918673F06F8F4ED188DDCE14F39945F6* dead_threshold
/>>>>/
/>>>>/ looks like we have different UUIDs. Where is this coming from??
/>>>>/
/>>>>/ ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
/>>>>/ 918673F06F8F4ED188DDCE14F39945F6: 1 refs
/>>>>/
/>>>>/
/>>>>/ On 10/19/2011 01:04, Sunil Mushran wrote:
/>>>>>/ Let's do it by hand.
/>>>>>/ rm -rf
/>>>>>/ /sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>/ *
/>>>>>/
/>>>>>/ On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
/>>>>>>/ ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>>/ ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping
/>>>>>>/ heartbeat
/>>>>>>/
/>>>>>>/ No improvment :(
/>>>>>>/
/>>>>>>/
/>>>>>>/ On 10/19/2011 00:50, Sunil Mushran wrote:
/>>>>>>>/ See if this cleans it up.
/>>>>>>>/ ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>>>/
/>>>>>>>/ On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
/>>>>>>>>/ ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>>>>/ 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
/>>>>>>>>/
/>>>>>>>>/
/>>>>>>>>/ On 10/19/2011 00:43, Sunil Mushran wrote:
/>>>>>>>>>/ ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
/>>>>>>>>>/
/>>>>>>>>>/ On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>/ mounted.ocfs2 -d
/>>>>>>>>>>/ Device FS Stack
/>>>>>>>>>>/ UUID Label
/>>>>>>>>>>/ /dev/mapper/volgr1-lvol0 ocfs2 o2cb
/>>>>>>>>>>/ 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
/>>>>>>>>>>/
/>>>>>>>>>>/ mounted.ocfs2 -f
/>>>>>>>>>>/ Device FS Nodes
/>>>>>>>>>>/ /dev/mapper/volgr1-lvol0 ocfs2 ro02xsrv001
/>>>>>>>>>>/
/>>>>>>>>>>/ ro02xsrv001 = the other node in the cluster.
/>>>>>>>>>>/
/>>>>>>>>>>/ By the way, there is no /dev/md-2
/>>>>>>>>>>/ ls /dev/dm-*
/>>>>>>>>>>/ /dev/dm-0 /dev/dm-1
/>>>>>>>>>>/
/>>>>>>>>>>/
/>>>>>>>>>>/ On 10/19/2011 00:37, Sunil Mushran wrote:
/>>>>>>>>>>>/ So it is not mounted. But we still have a hb thread because
/>>>>>>>>>>>/ hb could not be stopped during umount. The reason for that
/>>>>>>>>>>>/ could be the same that causes ocfs2_hb_ctl to fail.
/>>>>>>>>>>>/
/>>>>>>>>>>>/ Do:
/>>>>>>>>>>>/ mounted.ocfs2 -d
/>>>>>>>>>>>/
/>>>>>>>>>>>/ On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>>>/ ls -lR /sys/kernel/debug/ocfs2
/>>>>>>>>>>>>/ /sys/kernel/debug/ocfs2:
/>>>>>>>>>>>>/ total 0
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/ ls -lR /sys/kernel/debug/o2dlm
/>>>>>>>>>>>>/ /sys/kernel/debug/o2dlm:
/>>>>>>>>>>>>/ total 0
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/ ocfs2_hb_ctl -I -d /dev/dm-2
/>>>>>>>>>>>>/ ocfs2_hb_ctl: Device name specified was not found while
/>>>>>>>>>>>>/ reading uuid
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/ There is no /dev/dm-2 mounted.
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/
/>>>>>>>>>>>>/ On 10/19/2011 00:27, Sunil Mushran wrote:
/>>>>>>>>>>>>>/ mount -t debugfs debugfs /sys/kernel/debug
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/ Then list that dir.
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/ Also, do:
/>>>>>>>>>>>>>/ ocfs2_hb_ctl -l -d /dev/dm-2
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/ Be careful before killing. We want to be sure that dev is
/>>>>>>>>>>>>>/ not mounted.
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/ On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>>>>>/ Again the outputs:
/>>>>>>>>>>>>>>/ cat
/>>>>>>>>>>>>>>/ /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
/>>>>>>>>>>>>>>/ dm-2
/>>>>>>>>>>>>>>/ --->here should be volgr1-lvol0 i guess?
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/ ls -lR /sys/kernel/debug/ocfs2
/>>>>>>>>>>>>>>/ ls: /sys/kernel/debug/ocfs2: No such file or directory
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/ ls -lR /sys/kernel/debug/o2dlm
/>>>>>>>>>>>>>>/ ls: /sys/kernel/debug/o2dlm: No such file or directory
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/ I think i have to enable debug first somehow..?
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/ Laurentiu.
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/ On 10/19/2011 00:17, Sunil Mushran wrote:
/>>>>>>>>>>>>>>>/ What does this return?
/>>>>>>>>>>>>>>>/ cat
/>>>>>>>>>>>>>>>/ /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
/>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>/ Also, do:
/>>>>>>>>>>>>>>>/ ls -lR /sys/kernel/debug/ocfs2
/>>>>>>>>>>>>>>>/ ls -lR /sys/kernel/debug/o2dlm
/>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>/ On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>>>>>>>/ Here is the output:
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/ ls -lR /sys/kernel/config/cluster
/>>>>>>>>>>>>>>>>/ /sys/kernel/config/cluster:
/>>>>>>>>>>>>>>>>/ total 0
/>>>>>>>>>>>>>>>>/ drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/ /sys/kernel/config/cluster/CLUSTER:
/>>>>>>>>>>>>>>>>/ total 0
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
/>>>>>>>>>>>>>>>>/ drwxr-xr-x 3 root root 0 Oct 19 00:12 heartbeat
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12
/>>>>>>>>>>>>>>>>/ keepalive_delay_ms
/>>>>>>>>>>>>>>>>/ drwxr-xr-x 4 root root 0 Oct 11 20:23 node
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12
/>>>>>>>>>>>>>>>>/ reconnect_delay_ms
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/ /sys/kernel/config/cluster/CLUSTER/heartbeat:
/>>>>>>>>>>>>>>>>/ total 0
/>>>>>>>>>>>>>>>>/ drwxr-xr-x 2 root root 0 Oct 19 00:12
/>>>>>>>>>>>>>>>>/ 918673F06F8F4ED188DDCE14F39945F6
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/ /sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*:
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/ total 0
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
/>>>>>>>>>>>>>>>>/ -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/ /sys/kernel/config/cluster/CLUSTER/node:
/>>>>>>>>>>>>>>>>/ total 0
/>>>>>>>>>>>>>>>>/ drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
/>>>>>>>>>>>>>>>>/ drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/ /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
/>>>>>>>>>>>>>>>>/ total 0
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/ /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
/>>>>>>>>>>>>>>>>/ total 0
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
/>>>>>>>>>>>>>>>>/ -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/ On 10/19/2011 00:12, Sunil Mushran wrote:
/>>>>>>>>>>>>>>>>>/ ls -lR /sys/kernel/config/cluster
/>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>/ What does this return?
/>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>/ On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
/>>>>>>>>>>>>>>>>>>/ Hi,
/>>>>>>>>>>>>>>>>>>/ I have a 2 nodes ocfs2 cluster running UEK
/>>>>>>>>>>>>>>>>>>/ 2.6.32-100.0.19.el5,
/>>>>>>>>>>>>>>>>>>/ ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
/>>>>>>>>>>>>>>>>>>/ My problem is that all the time when i try to run
/>>>>>>>>>>>>>>>>>>/ /etc/init.d/o2cb stop
/>>>>>>>>>>>>>>>>>>/ it fails with this error:
/>>>>>>>>>>>>>>>>>>/ Stopping O2CB cluster CLUSTER: Failed
/>>>>>>>>>>>>>>>>>>/ Unable to stop cluster as heartbeat region
/>>>>>>>>>>>>>>>>>>/ still active
/>>>>>>>>>>>>>>>>>>/ There is no active mount point. I tried to manually
/>>>>>>>>>>>>>>>>>>/ stop the heartdbeat
/>>>>>>>>>>>>>>>>>>/ with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0
/>>>>>>>>>>>>>>>>>>/ ocfs2" (after finding
/>>>>>>>>>>>>>>>>>>/ the refs number with "ocfs2_hb_ctl -I -d
/>>>>>>>>>>>>>>>>>>/ /dev/mapper/volgr1-lvol0 ").
/>>>>>>>>>>>>>>>>>>/ But even if refs number is set to zero the "heartbeat
/>>>>>>>>>>>>>>>>>>/ region still
/>>>>>>>>>>>>>>>>>>/ active" occurs.
/>>>>>>>>>>>>>>>>>>/ How can i fix this?
/>>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>>/ Thank you in advance.
/>>>>>>>>>>>>>>>>>>/ Laurentiu.
/>>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>>>/ _______________________________________________
/>>>>>>>>>>>>>>>>>>/ Ocfs2-users mailing list
/>>>>>>>>>>>>>>>>>>/ Ocfs2-users at oss.oracle.com <http://oss.oracle.com/mailman/listinfo/ocfs2-users>
/>>>>>>>>>>>>>>>>>>/ http://oss.oracle.com/mailman/listinfo/ocfs2-users
/>>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>>/
/>>>>>>>>>>>>>/
/>>>>>>>>>>>>/
/>>>>>>>>>>>/
/>>>>>>>>>>/
/>>>>>>>>>/
/>>>>>>>>/
/>>>>>>>/
/>>>>>>/
/>>>>>/
/>>>>/
/>>>/
/>>/
/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121225/82d37a9e/attachment-0001.html
More information about the Ocfs2-users
mailing list