[Ocfs2-users] problem stopping o2cb service on one of nodes

Fri Apr 3 14:27:34 PDT 2009

Do:
$ cat /proc/sys/fs/ocfs2/nm/hb_ctl_path


Nikola Ciprich wrote:
> Hi Sunil,
> thanks for reply..
> I don't observe any segfaults...
> regarding info You want, as I wrote, umount doesn't decrease refcount...:
>
> [root at vbox4 ~]# ocfs2_hb_ctl -I -d /dev/vgshared/lvs
> 2A5D351D0A934061BBC6B5392A30187E: 1 refs
> [root at vbox4 ~]# umount /home/LVS
> [root at vbox4 ~]# ocfs2_hb_ctl -I -d /dev/vgshared/lvs
> 2A5D351D0A934061BBC6B5392A30187E: 1 refs
>
> nik
>
> On Fri, Apr 03, 2009 at 10:21:33AM -0700, Sunil Mushran wrote:
>   
>> umount is supposed to stop the heartbeat. In bz1053, ocfs2_hb_ctl was  
>> segfaulting.
>> Are you seeing any segfaults or any other errors during umount?
>>
>> Also, run the following before and after umount:
>> $ ocfs2_hb_ctl -I -d /dev/sdX o2cb
>>
>> Email me the output.
>>
>> Nikola Ciprich wrote:
>>     
>>> Hello Tao,
>>> and thanks a lot for reply!
>>> It seems not to be the same bug, at least applying the patch didn't help.
>>> stopping hb using -K parameter really helps, but why doesn't this work automatically
>>> on umount?
>>> it always happens on the second node...
>>> I don't see any error in logs, anything.
>>> But the reference count always increases on mount, and doesn't decrease on umount on this node..
>>>
>>>
>>> On Fri, Apr 03, 2009 at 10:58:18AM +0800, Tao Ma wrote:
>>>   
>>>       
>>>> Hi Nikola,
>>>>
>>>> Nikola Ciprich wrote:
>>>>     
>>>>         
>>>>> Hi,
>>>>> I'm trying ocfs2 RHEL5 distro, 2.6.29 kernel, ocfstools-1.4.1. I'm using DRBD in primary/primary mode
>>>>> as shared storage...
>>>>>
>>>>> I've configured the service according to quickstart document, and everything works,
>>>>> but when I umount fs on both nodes, stopping o2cb service on one of the nodes always
>>>>> fails with:
>>>>>
>>>>> [root at vbox4 sysconfig]# /etc/rc.d/init.d/o2cb stop
>>>>> Stopping O2CB cluster vb34: Failed
>>>>> Unable to stop cluster as heartbeat region still active
>>>>>       
>>>>>           
>>>> It looks that your disk heartbeat is still there. I don't know the   
>>>> specific reason, maybe   
>>>> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1053 ?
>>>>
>>>> but you can stop it manually.
>>>> 1.  ocfs2_hb_ctl -I -d <device>
>>>> or ocfs2_hb_ctl -I -u <uuid>
>>>> this will tell you the reference number for the hearbeat.
>>>> 2.  ocfs2_hb_ctl -K -d <device> <service>
>>>>   or  ocfs2_hb_ctl -K -u <uuid> <service>
>>>> this will killed the heartbeat manually.
>>>> service is the stack you used, and it should be "o2cb" in your case.
>>>>
>>>> btw, you can try cfs2_hb_ctl -K -u <uuid> <service> to see whether it 
>>>> is  the same problem as bug 1053.
>>>>
>>>> Regards,
>>>> Tao
>>>>     
>>>>         
>
>