[Ocfs2-users] Unable to stop cluster as heartbeat region still active

Laurentiu Gosu lg at easic.ro
Tue Oct 18 15:13:23 PDT 2011


well..this is weird
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
*918673F06F8F4ED188DDCE14F39945F6*  dead_threshold

looks like we have different UUIDs. Where is this coming from??

ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
918673F06F8F4ED188DDCE14F39945F6: 1 refs


On 10/19/2011 01:04, Sunil Mushran wrote:
> Let's do it by hand.
> rm -rf 
> /sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
> *
>
> On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
>>  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
>>
>> No improvment :(
>>
>>
>> On 10/19/2011 00:50, Sunil Mushran wrote:
>>> See if this cleans it up.
>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>
>>> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
>>>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
>>>>
>>>>
>>>> On 10/19/2011 00:43, Sunil Mushran wrote:
>>>>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>
>>>>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
>>>>>> mounted.ocfs2 -d
>>>>>> Device                FS     Stack  
>>>>>> UUID                              Label
>>>>>> /dev/mapper/volgr1-lvol0  ocfs2  o2cb   
>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2
>>>>>>
>>>>>> mounted.ocfs2 -f
>>>>>> Device                FS     Nodes
>>>>>> /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
>>>>>>
>>>>>> ro02xsrv001 = the other node in the cluster.
>>>>>>
>>>>>> By the way, there is no /dev/md-2
>>>>>>  ls /dev/dm-*
>>>>>> /dev/dm-0  /dev/dm-1
>>>>>>
>>>>>>
>>>>>> On 10/19/2011 00:37, Sunil Mushran wrote:
>>>>>>> So it is not mounted. But we still have a hb thread because
>>>>>>> hb could not be stopped during umount. The reason for that
>>>>>>> could be the same that causes ocfs2_hb_ctl to fail.
>>>>>>>
>>>>>>> Do:
>>>>>>> mounted.ocfs2 -d
>>>>>>>
>>>>>>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>> /sys/kernel/debug/ocfs2:
>>>>>>>> total 0
>>>>>>>>
>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>> /sys/kernel/debug/o2dlm:
>>>>>>>> total 0
>>>>>>>>
>>>>>>>> ocfs2_hb_ctl -I -d /dev/dm-2
>>>>>>>> ocfs2_hb_ctl: Device name specified was not found while reading 
>>>>>>>> uuid
>>>>>>>>
>>>>>>>> There is no /dev/dm-2 mounted.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/19/2011 00:27, Sunil Mushran wrote:
>>>>>>>>> mount -t debugfs debugfs /sys/kernel/debug
>>>>>>>>>
>>>>>>>>> Then list that dir.
>>>>>>>>>
>>>>>>>>> Also, do:
>>>>>>>>> ocfs2_hb_ctl -l -d /dev/dm-2
>>>>>>>>>
>>>>>>>>> Be careful before killing. We want to be sure that dev is not 
>>>>>>>>> mounted.
>>>>>>>>>
>>>>>>>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>>>>>>>>>> Again   the outputs:
>>>>>>>>>>  cat 
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>> dm-2
>>>>>>>>>> --->here should be volgr1-lvol0 i guess?
>>>>>>>>>>
>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>> ls: /sys/kernel/debug/ocfs2: No such file or directory
>>>>>>>>>>
>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>> ls: /sys/kernel/debug/o2dlm: No such file or directory
>>>>>>>>>>
>>>>>>>>>> I think i have to enable debug first somehow..?
>>>>>>>>>>
>>>>>>>>>> Laurentiu.
>>>>>>>>>>
>>>>>>>>>> On 10/19/2011 00:17, Sunil Mushran wrote:
>>>>>>>>>>> What does this return?
>>>>>>>>>>> cat 
>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>>>
>>>>>>>>>>> Also, do:
>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>
>>>>>>>>>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>> Here is the output:
>>>>>>>>>>>>
>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>> /sys/kernel/config/cluster:
>>>>>>>>>>>> total 0
>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>>>>>>>>>>>
>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER:
>>>>>>>>>>>> total 0
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>>>>>>>>>>>> drwxr-xr-x 3 root root    0 Oct 19 00:12 heartbeat
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>>>>>>>>>>>> drwxr-xr-x 4 root root    0 Oct 11 20:23 node
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>>>>>>>>>>>
>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>>>>>>>>>>>> total 0
>>>>>>>>>>>> drwxr-xr-x 2 root root    0 Oct 19 00:12 
>>>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>>>>>>>>>>>
>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*: 
>>>>>>>>>>>>
>>>>>>>>>>>> total 0
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>>>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>>>>>>>>>>>
>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node:
>>>>>>>>>>>> total 0
>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>>>>>>>>>>>
>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>>>>>>>>>>>> total 0
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>
>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>>>>>>>>>>>> total 0
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/19/2011 00:12, Sunil Mushran wrote:
>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>>>
>>>>>>>>>>>>> What does this return?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> I have a 2 nodes ocfs2 cluster running UEK 
>>>>>>>>>>>>>> 2.6.32-100.0.19.el5,
>>>>>>>>>>>>>> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
>>>>>>>>>>>>>> My problem is that all the time when i try to run 
>>>>>>>>>>>>>> /etc/init.d/o2cb stop
>>>>>>>>>>>>>> it fails with this error:
>>>>>>>>>>>>>>       Stopping O2CB cluster CLUSTER: Failed
>>>>>>>>>>>>>>       Unable to stop cluster as heartbeat region still 
>>>>>>>>>>>>>> active
>>>>>>>>>>>>>> There is no active mount point. I tried to manually stop 
>>>>>>>>>>>>>> the heartdbeat
>>>>>>>>>>>>>> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0 ocfs2" 
>>>>>>>>>>>>>> (after finding
>>>>>>>>>>>>>> the refs number with "ocfs2_hb_ctl -I -d 
>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ").
>>>>>>>>>>>>>> But even if refs number is set to zero the "heartbeat 
>>>>>>>>>>>>>> region still
>>>>>>>>>>>>>> active" occurs.
>>>>>>>>>>>>>> How can i fix this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Ocfs2-users mailing list
>>>>>>>>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20111019/39d2007e/attachment-0001.html 


More information about the Ocfs2-users mailing list