[Ocfs2-users] Unable to stop cluster as heartbeat region still active
Laurentiu Gosu
lg at easic.ro
Tue Oct 18 15:24:19 PDT 2011
Yes, i did reformat it(even more than once i think, last week). This is
a pre-production system and i'm trying various options before moving
into real life.
On 10/19/2011 01:19, Sunil Mushran wrote:
> Did you reformat the volume recently? or, when did you format last?
>
> On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:
>> well..this is weird
>> ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
>> *918673F06F8F4ED188DDCE14F39945F6* dead_threshold
>>
>> looks like we have different UUIDs. Where is this coming from??
>>
>> ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
>> 918673F06F8F4ED188DDCE14F39945F6: 1 refs
>>
>>
>> On 10/19/2011 01:04, Sunil Mushran wrote:
>>> Let's do it by hand.
>>> rm -rf
>>> /sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D
>>> *
>>>
>>> On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
>>>>
>>>> No improvment :(
>>>>
>>>>
>>>> On 10/19/2011 00:50, Sunil Mushran wrote:
>>>>> See if this cleans it up.
>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>
>>>>> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
>>>>>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
>>>>>>
>>>>>>
>>>>>> On 10/19/2011 00:43, Sunil Mushran wrote:
>>>>>>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>
>>>>>>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
>>>>>>>> mounted.ocfs2 -d
>>>>>>>> Device FS Stack
>>>>>>>> UUID Label
>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 o2cb
>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
>>>>>>>>
>>>>>>>> mounted.ocfs2 -f
>>>>>>>> Device FS Nodes
>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 ro02xsrv001
>>>>>>>>
>>>>>>>> ro02xsrv001 = the other node in the cluster.
>>>>>>>>
>>>>>>>> By the way, there is no /dev/md-2
>>>>>>>> ls /dev/dm-*
>>>>>>>> /dev/dm-0 /dev/dm-1
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/19/2011 00:37, Sunil Mushran wrote:
>>>>>>>>> So it is not mounted. But we still have a hb thread because
>>>>>>>>> hb could not be stopped during umount. The reason for that
>>>>>>>>> could be the same that causes ocfs2_hb_ctl to fail.
>>>>>>>>>
>>>>>>>>> Do:
>>>>>>>>> mounted.ocfs2 -d
>>>>>>>>>
>>>>>>>>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>> /sys/kernel/debug/ocfs2:
>>>>>>>>>> total 0
>>>>>>>>>>
>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>> /sys/kernel/debug/o2dlm:
>>>>>>>>>> total 0
>>>>>>>>>>
>>>>>>>>>> ocfs2_hb_ctl -I -d /dev/dm-2
>>>>>>>>>> ocfs2_hb_ctl: Device name specified was not found while
>>>>>>>>>> reading uuid
>>>>>>>>>>
>>>>>>>>>> There is no /dev/dm-2 mounted.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/19/2011 00:27, Sunil Mushran wrote:
>>>>>>>>>>> mount -t debugfs debugfs /sys/kernel/debug
>>>>>>>>>>>
>>>>>>>>>>> Then list that dir.
>>>>>>>>>>>
>>>>>>>>>>> Also, do:
>>>>>>>>>>> ocfs2_hb_ctl -l -d /dev/dm-2
>>>>>>>>>>>
>>>>>>>>>>> Be careful before killing. We want to be sure that dev is
>>>>>>>>>>> not mounted.
>>>>>>>>>>>
>>>>>>>>>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>> Again the outputs:
>>>>>>>>>>>> cat
>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>>>> dm-2
>>>>>>>>>>>> --->here should be volgr1-lvol0 i guess?
>>>>>>>>>>>>
>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>>> ls: /sys/kernel/debug/ocfs2: No such file or directory
>>>>>>>>>>>>
>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>> ls: /sys/kernel/debug/o2dlm: No such file or directory
>>>>>>>>>>>>
>>>>>>>>>>>> I think i have to enable debug first somehow..?
>>>>>>>>>>>>
>>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/19/2011 00:17, Sunil Mushran wrote:
>>>>>>>>>>>>> What does this return?
>>>>>>>>>>>>> cat
>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, do:
>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>> Here is the output:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>>>> /sys/kernel/config/cluster:
>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER:
>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 fence_method
>>>>>>>>>>>>>> drwxr-xr-x 3 root root 0 Oct 19 00:12 heartbeat
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 idle_timeout_ms
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 keepalive_delay_ms
>>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 11 20:23 node
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 reconnect_delay_ms
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12
>>>>>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dead_threshold
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 block_bytes
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>>>>>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 start_block
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node:
>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv001
>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 ro02xsrv002
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_address
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 ipv4_port
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/19/2011 00:12, Sunil Mushran wrote:
>>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What does this return?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> I have a 2 nodes ocfs2 cluster running UEK
>>>>>>>>>>>>>>>> 2.6.32-100.0.19.el5,
>>>>>>>>>>>>>>>> ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5.
>>>>>>>>>>>>>>>> My problem is that all the time when i try to run
>>>>>>>>>>>>>>>> /etc/init.d/o2cb stop
>>>>>>>>>>>>>>>> it fails with this error:
>>>>>>>>>>>>>>>> Stopping O2CB cluster CLUSTER: Failed
>>>>>>>>>>>>>>>> Unable to stop cluster as heartbeat region still
>>>>>>>>>>>>>>>> active
>>>>>>>>>>>>>>>> There is no active mount point. I tried to manually
>>>>>>>>>>>>>>>> stop the heartdbeat
>>>>>>>>>>>>>>>> with "ocfs2_hb_ctl -K -d /dev/mapper/volgr1-lvol0
>>>>>>>>>>>>>>>> ocfs2" (after finding
>>>>>>>>>>>>>>>> the refs number with "ocfs2_hb_ctl -I -d
>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ").
>>>>>>>>>>>>>>>> But even if refs number is set to zero the "heartbeat
>>>>>>>>>>>>>>>> region still
>>>>>>>>>>>>>>>> active" occurs.
>>>>>>>>>>>>>>>> How can i fix this?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> Ocfs2-users mailing list
>>>>>>>>>>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>>>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20111019/9c2d6b0e/attachment-0001.html
More information about the Ocfs2-users
mailing list