[Ocfs2-users] Unable to stop cluster as heartbeat region still active
Laurentiu Gosu
lg at easic.ro
Mon Dec 12 13:33:44 PST 2011
Well, the device exists in /proc/partitions:
### cat /proc/partitions |grep dm-2
253 2 11607154688 dm-2
### ll /dev/mapper/volgr1-lvol0
brw-rw---- 1 root disk 253, 2 Dec 11 14:14 /dev/mapper/volgr1-lvol0
I do not have any weird config, just a stripped lvm
volume(/dev/mapper/volgr1-lvol0) created out of 2 multipath
devices(/dev/mpath/mpathz & /dev/mpath/mpathy) which are made available
by iSCSI(/dev/sdX...).
Anyway, I think i can live with that(i create the symlink at boot time
from rc.local).
When is 1.8 supposed to go out?
And a side question: is there any nagios plugin available to monitor
cluster status? I could not find any.
br,
Laurentiu.
On 12/12/2011 21:02, Sunil Mushran wrote:
> Thanks. Yes, stop hb looks up for the device in /proc/partitions. I
> guess the
> utility is expecting the partitions there because that's how udev
> works normally.
>
> Having said that, I think we have made a change in 1.8 whereby stop hb
> does
> not scan the devices but just looks up configfs.
>
> On 12/11/2011 08:14 AM, Laurentiu Gosu wrote:
>>
>> Hi Sunil,
>> Maybe you remember the bellow thread. Shortly the pb was that
>> heartbeat region was still active after umounting the ocfs volume(i
>> use latest UEK + ocfs2-tools).
>> Based on this link
>> http://markmail.org/message/7h7r32avuitqdhzr#query:+page:1+mid:lq7arecz2dui6b3v+state:results
>> i manually created /dev/dm-2 symlink to point to my SAN device
>> [/dev/mapper/volgr1-lvol0] and the hearbeat was stopped normally.
>> Maybe it helps you find the real issue. As i understand that symlink
>> should be automatically created but it seems the pb is still there in
>> ocfs2-tools-1.6.3-2.el5.
>>
>> br,
>> laurentiu.
>>
>> On 10/24/2011 23:54, Sunil Mushran wrote:
>>> Well, I wouldn't advice you to go into prod with this problem.
>>> To figure out the issue, we'll need to provide a debug version of
>>> ocfs2_hb_ctl.
>>>
>>> If you have support, ping oracle support and ask for assistance.
>>>
>>> If not, download the source and run ocfs2_hb_ctl in gdb. The problem
>>> is in the code path that begins in the function lookup_dev().
>>>
>>> On 10/23/2011 01:30 PM, Laurentiu Gosu wrote:
>>>> #rpm -qa |grep ocfs2
>>>> ocfs2console-1.6.3-2.el5
>>>> ocfs2-tools-1.6.3-2.el5
>>>>
>>>> Just let me know if I can give more details to find the problem. I
>>>> will move ocfs2 into production in the next weeks.
>>>>
>>>>
>>>> On 10/23/2011 22:49, Sunil Mushran wrote:
>>>>> Are you sure you have ocfs2-tools-1.6.3? I remember we had an
>>>>> issue with this with an earlier release... 1.6.1/.2.
>>>>>
>>>>> On 10/23/2011 10:43 AM, Laurentiu Gosu wrote:
>>>>>> hmm..
>>>>>> #ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
>>>>>> *BUT:*
>>>>>> #ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat
>>>>>> I can still kill the ref using device name (-d).
>>>>>>
>>>>>> On 10/23/2011 17:57, Sunil Mushran wrote:
>>>>>>> I think it stops by uuid. So try doing this the next time.
>>>>>>> You are encountering some issue that we have not seen before.
>>>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
>>>>>>>
>>>>>>> On 10/23/2011 05:32 AM, Laurentiu Gosu wrote:
>>>>>>>> Hi Sunil,
>>>>>>>> Sorry for my late reply, i just had time today to start from
>>>>>>>> scratch and test.
>>>>>>>> I rebuilt my environment(2 nodes connected to a SAN via
>>>>>>>> iSCSI+multipath). I still have the issue that the heartbeat is
>>>>>>>> active after I umount my ocfs2 volume.
>>>>>>>> /etc/init.d/o2cb stop
>>>>>>>> Stopping O2CB cluster CLUST: Failed
>>>>>>>> Unable to stop cluster as heartbeat region still active
>>>>>>>>
>>>>>>>> ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0
>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
>>>>>>>>
>>>>>>>> After i manually kill the ref (ocfs2_hb_ctl -K -d
>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 ) i can stop successfully o2cb.
>>>>>>>> I can live with that but why doesn't it stop automatically? As
>>>>>>>> i understand, hearbeat should be started and stopped once the
>>>>>>>> volume gets mounted/umounted.
>>>>>>>>
>>>>>>>> br,
>>>>>>>> Laurentiu.
>>>>>>>>
>>>>>>>> On 10/19/2011 02:28, Sunil Mushran wrote:
>>>>>>>>> Manual delete will only work if there are no references. In
>>>>>>>>> your case
>>>>>>>>> there are references.
>>>>>>>>>
>>>>>>>>> You may want to start both nodes from scratch. Do not start/stop
>>>>>>>>> heartbeat manually. Also, do not force-format.
>>>>>>>>>
>>>>>>>>> On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
>>>>>>>>>> OK, i rebooted one of the nodes(both had similar issues); .
>>>>>>>>>> But something is still fishy.
>>>>>>>>>> - i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0
>>>>>>>>>> /mnt/tmp/
>>>>>>>>>> - i unmount it: umount /mnt/tmp/
>>>>>>>>>> - tried to stop o2cb: /etc/init.d/o2cb stop
>>>>>>>>>> Stopping O2CB cluster CLUSTER: Failed
>>>>>>>>>> Unable to stop cluster as heartbeat region still active
>>>>>>>>>> - ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs
>>>>>>>>>> - ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping
>>>>>>>>>> heartbeat
>>>>>>>>>> - ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/:
>>>>>>>>>> total 0
>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 01:50
>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold
>>>>>>>>>>
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:
>>>>>>>>>> total 0
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 dev
>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 01:50 pid
>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block
>>>>>>>>>>
>>>>>>>>>> - i cannot manually delete
>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/
>>>>>>>>>>
>>>>>>>>>> PS: i'm going to sleep now, i have to be up in a few hours.
>>>>>>>>>> We can continue tomorrow if it's ok with you.
>>>>>>>>>> Thank you for your help.
>>>>>>>>>>
>>>>>>>>>> Laurentiu.
>>>>>>>>>>
>>>>>>>>>> On 10/19/2011 01:33, Sunil Mushran wrote:
>>>>>>>>>>> One way this can happen is if one starts the hb manually and
>>>>>>>>>>> then force
>>>>>>>>>>> formats on that volume. The format will generate a new uuid.
>>>>>>>>>>> Once that
>>>>>>>>>>> happens, the hb tool cannot map the region to the device and
>>>>>>>>>>> thus fail
>>>>>>>>>>> to stop it. Right now the easiest option on this box is
>>>>>>>>>>> resetting it.
>>>>>>>>>>>
>>>>>>>>>>> On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>> Yes, i did reformat it(even more than once i think, last
>>>>>>>>>>>> week). This is a pre-production system and i'm trying
>>>>>>>>>>>> various options before moving into real life.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/19/2011 01:19, Sunil Mushran wrote:
>>>>>>>>>>>>> Did you reformat the volume recently? or, when did you
>>>>>>>>>>>>> format last?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>> well..this is weird
>>>>>>>>>>>>>> ls /sys/kernel/config/cluster/CLUSTER/heartbeat/
>>>>>>>>>>>>>> *918673F06F8F4ED188DDCE14F39945F6* dead_threshold
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> looks like we have different UUIDs. Where is this coming
>>>>>>>>>>>>>> from??
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6
>>>>>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6: 1 refs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/19/2011 01:04, Sunil Mushran wrote:
>>>>>>>>>>>>>>> Let's do it by hand.
>>>>>>>>>>>>>>> rm -rf
>>>>>>>>>>>>>>> /sys/kernel/config/cluster/.../heartbeat/*0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10/18/2011 02:52 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>>>>>>>> ocfs2_hb_ctl: File not found by ocfs2_lookup while
>>>>>>>>>>>>>>>> stopping heartbeat
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No improvment :(
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 10/19/2011 00:50, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>> See if this cleans it up.
>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 10/18/2011 02:44 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 10/19/2011 00:43, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -l -u 0C4AB55FE9314FA5A9F81652FDB9B22D
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>> mounted.ocfs2 -d
>>>>>>>>>>>>>>>>>>>> Device FS Stack
>>>>>>>>>>>>>>>>>>>> UUID Label
>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 o2cb
>>>>>>>>>>>>>>>>>>>> 0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> mounted.ocfs2 -f
>>>>>>>>>>>>>>>>>>>> Device FS Nodes
>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2 ro02xsrv001
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ro02xsrv001 = the other node in the cluster.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> By the way, there is no /dev/md-2
>>>>>>>>>>>>>>>>>>>> ls /dev/dm-*
>>>>>>>>>>>>>>>>>>>> /dev/dm-0 /dev/dm-1
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:37, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>>>> So it is not mounted. But we still have a hb
>>>>>>>>>>>>>>>>>>>>> thread because
>>>>>>>>>>>>>>>>>>>>> hb could not be stopped during umount. The reason
>>>>>>>>>>>>>>>>>>>>> for that
>>>>>>>>>>>>>>>>>>>>> could be the same that causes ocfs2_hb_ctl to fail.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Do:
>>>>>>>>>>>>>>>>>>>>> mounted.ocfs2 -d
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:32 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/debug/ocfs2:
>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/debug/o2dlm:
>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -I -d /dev/dm-2
>>>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl: Device name specified was not found
>>>>>>>>>>>>>>>>>>>>>> while reading uuid
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> There is no /dev/dm-2 mounted.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:27, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>>>>>> mount -t debugfs debugfs /sys/kernel/debug
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Then list that dir.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Also, do:
>>>>>>>>>>>>>>>>>>>>>>> ocfs2_hb_ctl -l -d /dev/dm-2
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Be careful before killing. We want to be sure
>>>>>>>>>>>>>>>>>>>>>>> that dev is not mounted.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:23 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Again the outputs:
>>>>>>>>>>>>>>>>>>>>>>>> cat
>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>>>>>>>>>>>>>>>> dm-2
>>>>>>>>>>>>>>>>>>>>>>>> --->here should be volgr1-lvol0 i guess?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>>>>>>>>>>>>>>> ls: /sys/kernel/debug/ocfs2: No such file or
>>>>>>>>>>>>>>>>>>>>>>>> directory
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>>>>>>>>>>>>>> ls: /sys/kernel/debug/o2dlm: No such file or
>>>>>>>>>>>>>>>>>>>>>>>> directory
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I think i have to enable debug first somehow..?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:17, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> What does this return?
>>>>>>>>>>>>>>>>>>>>>>>>> cat
>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Also, do:
>>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/ocfs2
>>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/debug/o2dlm
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:14 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Here is the output:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster:
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 19 00:12 CLUSTER
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER:
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> fence_method
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 3 root root 0 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> heartbeat
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> idle_timeout_ms
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> keepalive_delay_ms
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 4 root root 0 Oct 11 20:23 node
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> reconnect_delay_ms
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat:
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> 918673F06F8F4ED188DDCE14F39945F6
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> dead_threshold
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/heartbeat/*918673F06F8F4ED188DDCE14F39945F6*:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> block_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 blocks
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 dev
>>>>>>>>>>>>>>>>>>>>>>>>>> -r--r--r-- 1 root root 4096 Oct 19 00:12 pid
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> start_block
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node:
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> ro02xsrv001
>>>>>>>>>>>>>>>>>>>>>>>>>> drwxr-xr-x 2 root root 0 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> ro02xsrv002
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> ipv4_address
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> ipv4_port
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> /sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> ipv4_address
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12
>>>>>>>>>>>>>>>>>>>>>>>>>> ipv4_port
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 local
>>>>>>>>>>>>>>>>>>>>>>>>>> -rw-r--r-- 1 root root 4096 Oct 19 00:12 num
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/19/2011 00:12, Sunil Mushran wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> ls -lR /sys/kernel/config/cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> What does this return?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/2011 02:05 PM, Laurentiu Gosu wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have a 2 nodes ocfs2 cluster running UEK
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.6.32-100.0.19.el5,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ocfs2console-1.6.3-2.el5,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ocfs2-tools-1.6.3-2.el5.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> My problem is that all the time when i try
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to run /etc/init.d/o2cb stop
>>>>>>>>>>>>>>>>>>>>>>>>>>>> it fails with this error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stopping O2CB cluster CLUSTER: Failed
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Unable to stop cluster as heartbeat
>>>>>>>>>>>>>>>>>>>>>>>>>>>> region still active
>>>>>>>>>>>>>>>>>>>>>>>>>>>> There is no active mount point. I tried to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> manually stop the heartdbeat
>>>>>>>>>>>>>>>>>>>>>>>>>>>> with "ocfs2_hb_ctl -K -d
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ocfs2" (after finding
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the refs number with "ocfs2_hb_ctl -I -d
>>>>>>>>>>>>>>>>>>>>>>>>>>>> /dev/mapper/volgr1-lvol0 ").
>>>>>>>>>>>>>>>>>>>>>>>>>>>> But even if refs number is set to zero the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "heartbeat region still
>>>>>>>>>>>>>>>>>>>>>>>>>>>> active" occurs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> How can i fix this?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Laurentiu.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ocfs2-users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20111212/be45f372/attachment-0001.html
More information about the Ocfs2-users
mailing list