[Ocfs2-users] Unable to stop cluster as heartbeat region still active

Laurentiu Gosu lg at easic.ro
Mon Dec 12 13:33:44 PST 2011


Well, the device exists in /proc/partitions:
### cat /proc/partitions |grep dm-2
  253        2 11607154688 dm-2
### ll /dev/mapper/volgr1-lvol0
brw-rw---- 1 root disk 253, 2 Dec 11 14:14 /dev/mapper/volgr1-lvol0

I do not have any weird config, just a stripped lvm 
volume(/dev/mapper/volgr1-lvol0) created out of 2 multipath 
devices(/dev/mpath/mpathz & /dev/mpath/mpathy) which are made available 
by iSCSI(/dev/sdX...).

Anyway, I think i can live with that(i create the symlink at boot time 
from rc.local).
When is 1.8 supposed to go out?
And a side question: is there any nagios plugin available to monitor 
cluster status? I could not find any.
br,
Laurentiu.


On 12/12/2011 21:02, Sunil Mushran wrote:
> Thanks. Yes, stop hb looks up for the device in /proc/partitions. I 
> guess the
> utility is expecting the partitions there because that's how udev 
> works normally.
>
> Having said that, I think we have made a change in 1.8 whereby stop hb 
> does
> not scan the devices but just looks up configfs.
>
> On 12/11/2011 08:14 AM, Laurentiu Gosu wrote:
>>
>> Hi Sunil,
>> Maybe you remember the bellow thread. Shortly the pb was that 
>> heartbeat region was still active after umounting the ocfs volume(i 
>> use latest UEK + ocfs2-tools).
>> Based on this link 
>> http://markmail.org/message/7h7r32avuitqdhzr#query:+page:1+mid:lq7arecz2dui6b3v+state:results 
>> i manually created /dev/dm-2 symlink to point to my SAN device 
>> [/dev/mapper/volgr1-lvol0] and the hearbeat was stopped normally.  
>> Maybe it helps you find the real issue. As i understand that symlink 
>> should be automatically created but it seems the pb is still there in 
>> ocfs2-tools-1.6.3-2.el5.
>>
>> br,
>> laurentiu.
>>
>> On 10/24/2011 23:54, Sunil Mushran wrote:
>>> Well, I wouldn't advice you to go into prod with this problem.
>>> To figure out the issue, we'll need to provide a debug version of
>>> ocfs2_hb_ctl.
>>>
>>> If you have support, ping oracle support and ask for assistance.
>>>
>>> If not, download the source and run ocfs2_hb_ctl in gdb. The problem
>>> is in the code path that begins in the function lookup_dev().
>>>
>>> On 10/23/2011 01:30 PM, Laurentiu Gosu wrote:
>>>> #rpm -qa |grep ocfs2
>>>> ocfs2console-1.6.3-2.el5
>>>> ocfs2-tools-1.6.3-2.el5
>>>>
>>>> Just let me know if I can give more details to find the problem. I 
>>>> will move ocfs2 into production in the next weeks.
>>>>
>>>>
>>>> On 10/23/2011 22:49, Sunil Mushran wrote:
>>>>> Are you sure you have ocfs2-tools-1.6.3? I remember we had an
>>>>> issue with this with an earlier release... 1.6.1/.2.
>>>>>
>>>>> On 10/23/2011 10:43 AM, Laurentiu Gosu wrote:
>>>>>> hmm..
>>>>>> #ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
>>>>>> *BUT:*
>>>>>> #ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
>>>>>> I can still kill the ref using device name (-d).
>>>>>>
>>>>>> On 10/23/2011 17:57, Sunil Mushran wrote:
>>>>>>> I think it stops by uuid. So try doing this the next time.
>>>>>>> You are encountering some issue that we have not seen before.
>>>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
>>>>>>>
>>>>>>> On 10/23/2011 05:32 AM, Laurentiu Gosu wrote:
>>>>>>>> Hi Sunil,
>>>>>>>> Sorry for my late reply, i just had time today to start from 
>>>>>>>> scratch and test.
>>>>>>>> I rebuilt my environment(2 nodes connected to a SAN via 
>>>>>>>> iSCSI+multipath). I still have the issue that the heartbeat is 
>>>>>>>> active after I umount my ocfs2 volume.
>>>>>>>> /etc/init.d/o2cb stop
>>>>>>>> Stopping O2CB cluster CLUST: Failed
>>>>>>>> Unable to stop cluster as heartbeat region still active
>>>>>>>>
>>>>>>>> ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0
>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
>>>>>>>>
>>>>>>>> After i manually kill the ref (ocfs2_hb_ctl -K -d 
>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb. 
>>>>>>>> I can live with that but why doesn't it stop automatically? As 
>>>>>>>> i understand, hearbeat should be started and stopped once the 
>>>>>>>> volume gets mounted/umounted.
>>>>>>>>
>>>>>>>> br,
>>>>>>>> Laurentiu.
>>>>>>>>
>>>>>>>> On 10/19/2011 02:28, Sunil Mushran wrote:
>>>>>>>>> Manual delete will only work if there are no references. In 
>>>>>>>>> your case
>>>>>>>>> there are references.
>>>>>>>>>
>>>>>>>>> You may want to start both nodes from scratch. Do not start/stop
>>>>>>>>> heartbeat manually. Also, do not force-format.
>>>>>>>>>
>>>>>>>>> On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
>>>>>>>>>> OK, i rebooted one of the nodes(both had similar issues); . 
>>>>>>>>>> But something is still fishy.
>>>>>>>>>> - i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0 
>>>>>>>>>> /mnt/tmp/
>>>>>>>>>> - i unmount it: umount /mnt/tmp/
>>>>>>>>>> - tried to stop o2cb:  /etc/init.d/o2cb stop
>>>>>>>>>> Stopping O2CB cluster CLUSTER: Failed
>>>>>>>>>> Unable to stop cluster as heartbeat region still active
>>>>>>>>>> - ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
>>>>>>>>>> -  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping 
>>>>>>>>>> heartbeat
>>>>>>>>>> - ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/:
>>>>>>>>>> total 0
>>>>>>>>>> drwxr-xr-x 2 root root    0 Oct 19 01:50 
>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold
>>>>>>>>>>
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
>>>>>>>>>> total 0
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 01:50 pid
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block
>>>>>>>>>>
>>>>>>>>>> - i cannot manually delete 
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/
>>>>>>>>>>
>>>>>>>>>> PS: i'm going to sleep now, i have to be up in a few hours. 
>>>>>>>>>> We can continue tomorrow if it's ok with you.
>>>>>>>>>> Thank you for your help.
>>>>>>>>>>
>>>>>>>>>> Laurentiu.
>>>>>>>>>>
>>>>>>>>>> On 10/19/2011 01:33, Sunil Mushran wrote:
>>>>>>>>>>> One way this can happen is if one starts the hb manually and 
>>>>>>>>>>> then force
>>>>>>>>>>> formats on that volume. The format will generate a new uuid. 
>>>>>>>>>>> Once that
>>>>>>>>>>> happens, the hb tool cannot map the region to the device and 
>>>>>>>>>>> thus fail
>>>>>>>>>>> to stop it. Right now the easiest option on this box is 
>>>>>>>>>>> resetting it.
>>>>>>>>>>>
>>>>>>>>>>> On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>> Yes, i did reformat it(even more than once i think, last 
>>>>>>>>>>>> week). This is a pre-production system and i'm trying 
>>>>>>>>>>>> various options before moving into real life.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/19/2011 01:19, Sunil Mushran wrote:
>>>>>>>>>>>>> Did you reformat the volume recently? or, when did you 
>>>>>>>>>>>>> format last?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>> well..this is weird
>>>>>>>>>>>>>> ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
>>>>>>>>>>>>>> *918673F06F8F4ED188DDCE14F39945F6*  dead_threshold
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> looks like we have different UUIDs. Where is this coming 
>>>>>>>>>>>>>> from??
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
>>>>>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6: 1 refs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/19/2011 01:04, Sunil Mushran wrote:
>>>>>>>>>>>>>>> Let's do it by hand.
>>>>>>>>>>>>>>> rm -rf 
>>>>>>>>>>>>>>> /sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D 
>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>  ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>>>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while 
>>>>>>>>>>>>>>>> stopping heartbeat
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No improvment :(
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 10/19/2011 00:50, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>> See if this cleans it up.
>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 10/19/2011 00:43, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>> mounted.ocfs2 -d
>>>>>>>>>>>>>>>>>>>> Device                FS     Stack  
>>>>>>>>>>>>>>>>>>>> UUID                              Label
>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0  ocfs2  o2cb   
>>>>>>>>>>>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D  ocfs2
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> mounted.ocfs2 -f
>>>>>>>>>>>>>>>>>>>> Device                FS     Nodes
>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0  ocfs2  ro02xsrv001
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ro02xsrv001 = the other node in the cluster.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> By the way, there is no /dev/md-2
>>>>>>>>>>>>>>>>>>>>  ls /dev/dm-*
>>>>>>>>>>>>>>>>>>>> /dev/dm-0  /dev/dm-1
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:37, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>>>> So it is not mounted. But we still have a hb 
>>>>>>>>>>>>>>>>>>>>> thread because
>>>>>>>>>>>>>>>>>>>>> hb could not be stopped during umount. The reason 
>>>>>>>>>>>>>>>>>>>>> for that
>>>>>>>>>>>>>>>>>>>>> could be the same that causes ocfs2_hb_ctl to fail.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Do:
>>>>>>>>>>>>>>>>>>>>> mounted.ocfs2 -d
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/debug/ocfs2:
>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/debug/o2dlm:
>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -I -d /dev/dm-2
>>>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl: Device name specified was not found 
>>>>>>>>>>>>>>>>>>>>>> while reading uuid
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> There is no /dev/dm-2 mounted.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:27, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>>>>>> mount -t debugfs debugfs /sys/kernel/debug
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Then list that dir.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Also, do:
>>>>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -l -d /dev/dm-2
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Be careful before killing. We want to be sure 
>>>>>>>>>>>>>>>>>>>>>>> that dev is not mounted.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Again   the outputs:
>>>>>>>>>>>>>>>>>>>>>>>>  cat 
>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>>>>>>>>>>>>>>>> dm-2
>>>>>>>>>>>>>>>>>>>>>>>> --->here should be volgr1-lvol0 i guess?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>>>>>>>>>>>>>>> ls: /sys/kernel/debug/ocfs2: No such file or 
>>>>>>>>>>>>>>>>>>>>>>>> directory
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>>>>>>>>>>>>>> ls: /sys/kernel/debug/o2dlm: No such file or 
>>>>>>>>>>>>>>>>>>>>>>>> directory
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I think i have to enable debug first somehow..?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:17, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> What does this return?
>>>>>>>>>>>>>>>>>>>>>>>>> cat 
>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Also, do:
>>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Here is the output:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster:
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER:
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> fence_method
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 3 root root    0 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> heartbeat
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> idle_timeout_ms
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> keepalive_delay_ms
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 4 root root    0 Oct 11 20:23 node
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> reconnect_delay_ms
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root    0 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> dead_threshold
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*: 
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> block_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>>>>>>>>>>>>>>>>>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> start_block
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node:
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> ro02xsrv001
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> ro02xsrv002
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001: 
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> ipv4_address
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> ipv4_port
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002: 
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> ipv4_address
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 
>>>>>>>>>>>>>>>>>>>>>>>>>> ipv4_port
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:12, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> What does this return?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have a 2 nodes ocfs2 cluster running UEK 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.6.32-100.0.19.el5,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ocfs2console-1.6.3-2.el5, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ocfs2-tools-1.6.3-2.el5.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> My problem is that all the time when i try 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to run /etc/init.d/o2cb stop
>>>>>>>>>>>>>>>>>>>>>>>>>>>> it fails with this error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       Stopping O2CB cluster CLUSTER: Failed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       Unable to stop cluster as heartbeat 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> region still active
>>>>>>>>>>>>>>>>>>>>>>>>>>>> There is no active mount point. I tried to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> manually stop the heartdbeat
>>>>>>>>>>>>>>>>>>>>>>>>>>>> with "ocfs2_hb_ctl -K -d 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2" (after finding
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the refs number with "ocfs2_hb_ctl -I -d 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ").
>>>>>>>>>>>>>>>>>>>>>>>>>>>> But even if refs number is set to zero the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "heartbeat region still
>>>>>>>>>>>>>>>>>>>>>>>>>>>> active" occurs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> How can i fix this?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ocfs2-users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20111212/be45f372/attachment-0001.html 


More information about the Ocfs2-users mailing list