[Ocfs2-users] Unable to stop cluster as heartbeat region still active

Laurentiu Gosu lg at easic.ro
Tue Oct 18 14:52:37 PDT 2011


  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat

No improvment :(


On 10/19/2011 00:50, Sunil Mushran wrote:
> See if this cleans it up.
> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>
> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
>>
>>
>> On 10/19/2011 00:43, Sunil Mushran wrote:
>>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>
>>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
>>>> mounted.ocfs2 -d
>>>> Device                FS     Stack  
>>>> UUID                              Label
>>>> /dev/mapper/volgr1-lvol0  ocfs2  o2cb   
>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2
>>>>
>>>> mounted.ocfs2 -f
>>>> Device                FS     Nodes
>>>> /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
>>>>
>>>> ro02xsrv001 = the other node in the cluster.
>>>>
>>>> By the way, there is no /dev/md-2
>>>>  ls /dev/dm-*
>>>> /dev/dm-0  /dev/dm-1
>>>>
>>>>
>>>> On 10/19/2011 00:37, Sunil Mushran wrote:
>>>>> So it is not mounted. But we still have a hb thread because
>>>>> hb could not be stopped during umount. The reason for that
>>>>> could be the same that causes ocfs2_hb_ctl to fail.
>>>>>
>>>>> Do:
>>>>> mounted.ocfs2 -d
>>>>>
>>>>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>> /sys/kernel/debug/ocfs2:
>>>>>> total 0
>>>>>>
>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>> /sys/kernel/debug/o2dlm:
>>>>>> total 0
>>>>>>
>>>>>> ocfs2_hb_ctl -I -d /dev/dm-2
>>>>>> ocfs2_hb_ctl: Device name specified was not found while reading uuid
>>>>>>
>>>>>> There is no /dev/dm-2 mounted.
>>>>>>
>>>>>>
>>>>>> On 10/19/2011 00:27, Sunil Mushran wrote:
>>>>>>> mount -t debugfs debugfs /sys/kernel/debug
>>>>>>>
>>>>>>> Then list that dir.
>>>>>>>
>>>>>>> Also, do:
>>>>>>> ocfs2_hb_ctl -l -d /dev/dm-2
>>>>>>>
>>>>>>> Be careful before killing. We want to be sure that dev is not 
>>>>>>> mounted.
>>>>>>>
>>>>>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>>>>>>>> Again   the outputs:
>>>>>>>>  cat 
>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>> dm-2
>>>>>>>> --->here should be volgr1-lvol0 i guess?
>>>>>>>>
>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>> ls: /sys/kernel/debug/ocfs2: No such file or directory
>>>>>>>>
>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>> ls: /sys/kernel/debug/o2dlm: No such file or directory
>>>>>>>>
>>>>>>>> I think i have to enable debug first somehow..?
>>>>>>>>
>>>>>>>> Laurentiu.
>>>>>>>>
>>>>>>>> On 10/19/2011 00:17, Sunil Mushran wrote:
>>>>>>>>> What does this return?
>>>>>>>>> cat 
>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>
>>>>>>>>> Also, do:
>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>
>>>>>>>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>>>>>>>>>> Here is the output:
>>>>>>>>>>
>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>> /sys/kernel/config/cluster:
>>>>>>>>>> total 0
>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>>>>>>>>>
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER:
>>>>>>>>>> total 0
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>>>>>>>>>> drwxr-xr-x 3 root root    0 Oct 19 00:12 heartbeat
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>>>>>>>>>> drwxr-xr-x 4 root root    0 Oct 11 20:23 node
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>>>>>>>>>
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>>>>>>>>>> total 0
>>>>>>>>>> drwxr-xr-x 2 root root    0 Oct 19 00:12 
>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>>>>>>>>>
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6: 
>>>>>>>>>>
>>>>>>>>>> total 0
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>>>>>>>>>
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node:
>>>>>>>>>> total 0
>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>>>>>>>>>
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>>>>>>>>>> total 0
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>>>>>>>>>> total 0
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/19/2011 00:12, Sunil Mushran wrote:
>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>
>>>>>>>>>>> What does this return?
>>>>>>>>>>>
>>>>>>>>>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> I have a 2 nodes ocfs2 cluster running UEK 
>>>>>>>>>>>> 2.6.32-100.0.19.el5,
>>>>>>>>>>>> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
>>>>>>>>>>>> My problem is that all the time when i try to run 
>>>>>>>>>>>> /etc/init.d/o2cb stop
>>>>>>>>>>>> it fails with this error:
>>>>>>>>>>>>       Stopping O2CB cluster CLUSTER: Failed
>>>>>>>>>>>>       Unable to stop cluster as heartbeat region still active
>>>>>>>>>>>> There is no active mount point. I tried to manually stop 
>>>>>>>>>>>> the heartdbeat
>>>>>>>>>>>> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" 
>>>>>>>>>>>> (after finding
>>>>>>>>>>>> the refs number with "ocfs2_hb_ctl -I -d 
>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ").
>>>>>>>>>>>> But even if refs number is set to zero the "heartbeat 
>>>>>>>>>>>> region still
>>>>>>>>>>>> active" occurs.
>>>>>>>>>>>> How can i fix this?
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Ocfs2-users mailing list
>>>>>>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>




More information about the Ocfs2-users mailing list