[Ocfs2-users] Unable to stop cluster as heartbeat region still active

Sunil Mushran sunil.mushran at oracle.com
Tue Oct 18 15:19:47 PDT 2011


Did you reformat the volume recently? or, when did you format last?

On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:
> well..this is weird
> ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
> *918673F06F8F4ED188DDCE14F39945F6*  dead_threshold
>
> looks like we have different UUIDs. Where is this coming from??
>
> ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
> 918673F06F8F4ED188DDCE14F39945F6: 1 refs
>
>
> On 10/19/2011 01:04, Sunil Mushran wrote:
>> Let's do it by hand.
>> rm -rf /sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D *
>>
>> On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
>>>  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
>>>
>>> No improvment :(
>>>
>>>
>>> On 10/19/2011 00:50, Sunil Mushran wrote:
>>>> See if this cleans it up.
>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>
>>>> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
>>>>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
>>>>>
>>>>>
>>>>> On 10/19/2011 00:43, Sunil Mushran wrote:
>>>>>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>
>>>>>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
>>>>>>> mounted.ocfs2 -d
>>>>>>> Device                FS     Stack  UUID                              Label
>>>>>>> /dev/mapper/volgr1-lvol0  ocfs2  o2cb   0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2
>>>>>>>
>>>>>>> mounted.ocfs2 -f
>>>>>>> Device                FS     Nodes
>>>>>>> /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
>>>>>>>
>>>>>>> ro02xsrv001 = the other node in the cluster.
>>>>>>>
>>>>>>> By the way, there is no /dev/md-2
>>>>>>>  ls /dev/dm-*
>>>>>>> /dev/dm-0  /dev/dm-1
>>>>>>>
>>>>>>>
>>>>>>> On 10/19/2011 00:37, Sunil Mushran wrote:
>>>>>>>> So it is not mounted. But we still have a hb thread because
>>>>>>>> hb could not be stopped during umount. The reason for that
>>>>>>>> could be the same that causes ocfs2_hb_ctl to fail.
>>>>>>>>
>>>>>>>> Do:
>>>>>>>> mounted.ocfs2 -d
>>>>>>>>
>>>>>>>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>> /sys/kernel/debug/ocfs2:
>>>>>>>>> total 0
>>>>>>>>>
>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>> /sys/kernel/debug/o2dlm:
>>>>>>>>> total 0
>>>>>>>>>
>>>>>>>>> ocfs2_hb_ctl -I -d /dev/dm-2
>>>>>>>>> ocfs2_hb_ctl: Device name specified was not found while reading uuid
>>>>>>>>>
>>>>>>>>> There is no /dev/dm-2 mounted.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/19/2011 00:27, Sunil Mushran wrote:
>>>>>>>>>> mount -t debugfs debugfs /sys/kernel/debug
>>>>>>>>>>
>>>>>>>>>> Then list that dir.
>>>>>>>>>>
>>>>>>>>>> Also, do:
>>>>>>>>>> ocfs2_hb_ctl -l -d /dev/dm-2
>>>>>>>>>>
>>>>>>>>>> Be careful before killing. We want to be sure that dev is not mounted.
>>>>>>>>>>
>>>>>>>>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>>>>>>>>>>> Again   the outputs:
>>>>>>>>>>>  cat /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>>> dm-2
>>>>>>>>>>> --->here should be volgr1-lvol0 i guess?
>>>>>>>>>>>
>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>> ls: /sys/kernel/debug/ocfs2: No such file or directory
>>>>>>>>>>>
>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>> ls: /sys/kernel/debug/o2dlm: No such file or directory
>>>>>>>>>>>
>>>>>>>>>>> I think i have to enable debug first somehow..?
>>>>>>>>>>>
>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>
>>>>>>>>>>> On 10/19/2011 00:17, Sunil Mushran wrote:
>>>>>>>>>>>> What does this return?
>>>>>>>>>>>> cat /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>>>>
>>>>>>>>>>>> Also, do:
>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>> Here is the output:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>>> /sys/kernel/config/cluster:
>>>>>>>>>>>>> total 0
>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>>>>>>>>>>>>
>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER:
>>>>>>>>>>>>> total 0
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>>>>>>>>>>>>> drwxr-xr-x 3 root root    0 Oct 19 00:12 heartbeat
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>>>>>>>>>>>>> drwxr-xr-x 4 root root    0 Oct 11 20:23 node
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>>>>>>>>>>>>
>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>>>>>>>>>>>>> total 0
>>>>>>>>>>>>> drwxr-xr-x 2 root root    0 Oct 19 00:12 918673F06F8F4ED188DDCE14F39945F6
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>>>>>>>>>>>>
>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*:
>>>>>>>>>>>>> total 0
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>>>>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>>>>>>>>>>>>
>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node:
>>>>>>>>>>>>> total 0
>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>>>>>>>>>>>>
>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>>>>>>>>>>>>> total 0
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>>
>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>>>>>>>>>>>>> total 0
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/19/2011 00:12, Sunil Mushran wrote:
>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What does this return?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5,
>>>>>>>>>>>>>>> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
>>>>>>>>>>>>>>> My problem is that all the time when i try to run /etc/init.d/o2cb stop
>>>>>>>>>>>>>>> it fails with this error:
>>>>>>>>>>>>>>>       Stopping O2CB cluster CLUSTER: Failed
>>>>>>>>>>>>>>>       Unable to stop cluster as heartbeat region still active
>>>>>>>>>>>>>>> There is no active mount point. I tried to manually stop the heartdbeat
>>>>>>>>>>>>>>> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" (after finding
>>>>>>>>>>>>>>> the refs number with "ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0 ").
>>>>>>>>>>>>>>> But even if refs number is set to zero the "heartbeat region still
>>>>>>>>>>>>>>> active" occurs.
>>>>>>>>>>>>>>> How can i fix this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Ocfs2-users mailing list
>>>>>>>>>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20111018/9cf7257e/attachment.html 


More information about the Ocfs2-users mailing list