[Ocfs2-users] problem stopping o2cb service on one of nodes

Sunil Mushran sunil.mushran at oracle.com
Mon Apr 13 11:16:29 PDT 2009


You mean RHEL 5.3. I have (rh)el5.3 with udev-095-14.19.el5 and it is
working as expected. I don't see the problem you are referring to.

Nikola Ciprich wrote:
> Hi Sunil,
> well, I don't know exact version which started creating them,
> but latest centos (based on RHEL 5.4) uses version 95, which doesn't
> create them for me.
> Version 127 I used for tests already creates them correctly..
> BR
> nik
>
> On Mon, Apr 06, 2009 at 12:08:34PM -0700, Sunil Mushran wrote:
>   
>> AFAIK, this is not an issue on (rh)el4/sles9. It could be that the
>> enterprise distros never shipped that older version of udev.
>>
>> I could cross check. Do you know the version of udev that started
>> creating the /dev/dm-xx devices?
>>
>> Nikola Ciprich wrote:
>>     
>>> OK, I've got it...
>>> the problem is, that mounted.ocfs2 scans devices appearing in /proc/partitions, and only in directly under /dev
>>>
>>> but I'm using device mapper based storage, andd older versions of udev do not create all device mapper devices also in /dev/dm-XX, but only in /dev/mapper/... which is not therefore scanned by mounted.ocfs2
>>> the reason why it was working on one of my nodes is, that I've updated udev there some time ago for some other tests.
>>>
>>> so while updating udev is a workaround around the problem, I guess it might be good to fix in mounted.ocfs2, as people using older distros(especially enterprise ones) might stumble upon the problem if using device-mapper based storage...
>>>
>>> I can try to create a fix for this problem, trying to open dev under /dev/mapper if it's not found under /dev might be the way?
>>>
>>> Anyways Sunil thanks a lot for Your help!
>>>
>>> On Sun, Apr 05, 2009 at 07:31:52AM -0700, Sunil Mushran wrote:
>>>   
>>>       
>>>> Email me the ouput of:
>>>> $ mounted.ocfs2 -d
>>>>
>>>> Also, does hb stop using uuid work?
>>>> $ ocfs2_hb_ctl -K -u <uuid> o2cb
>>>>
>>>> Lastly, what versions of the fs, tools, kernel?
>>>>
>>>> On Apr 4, 2009, at 1:24 AM, Nikola Ciprich <extmaillist at linuxbox.cz>  
>>>> wrote:
>>>>
>>>>     
>>>>         
>>>>> Hi,
>>>>> it says:
>>>>> /sbin/ocfs2_hb_ctl
>>>>> on both nodes, which's correct - the binary is there...
>>>>> n.
>>>>>
>>>>> On Fri, Apr 03, 2009 at 02:27:34PM -0700, Sunil Mushran wrote:
>>>>>       
>>>>>           
>>>>>> Do:
>>>>>> $ cat /proc/sys/fs/ocfs2/nm/hb_ctl_path
>>>>>>
>>>>>>
>>>>>> Nikola Ciprich wrote:
>>>>>>         
>>>>>>             
>>>>>>> Hi Sunil,
>>>>>>> thanks for reply..
>>>>>>> I don't observe any segfaults...
>>>>>>> regarding info You want, as I wrote, umount doesn't decrease   
>>>>>>> refcount...:
>>>>>>>
>>>>>>> [root at vbox4 ~]# ocfs2_hb_ctl -I -d /dev/vgshared/lvs
>>>>>>> 2A5D351D0A934061BBC6B5392A30187E: 1 refs
>>>>>>> [root at vbox4 ~]# umount /home/LVS
>>>>>>> [root at vbox4 ~]# ocfs2_hb_ctl -I -d /dev/vgshared/lvs
>>>>>>> 2A5D351D0A934061BBC6B5392A30187E: 1 refs
>>>>>>>
>>>>>>> nik
>>>>>>>
>>>>>>> On Fri, Apr 03, 2009 at 10:21:33AM -0700, Sunil Mushran wrote:
>>>>>>>
>>>>>>>           
>>>>>>>               
>>>>>>>> umount is supposed to stop the heartbeat. In bz1053, 
>>>>>>>> ocfs2_hb_ctl was
>>>>>>>> segfaulting.
>>>>>>>> Are you seeing any segfaults or any other errors during umount?
>>>>>>>>
>>>>>>>> Also, run the following before and after umount:
>>>>>>>> $ ocfs2_hb_ctl -I -d /dev/sdX o2cb
>>>>>>>>
>>>>>>>> Email me the output.
>>>>>>>>
>>>>>>>> Nikola Ciprich wrote:
>>>>>>>>
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> Hello Tao,
>>>>>>>>> and thanks a lot for reply!
>>>>>>>>> It seems not to be the same bug, at least applying the 
>>>>>>>>> patch  didn't help.
>>>>>>>>> stopping hb using -K parameter really helps, but why 
>>>>>>>>> doesn't  this work automatically
>>>>>>>>> on umount?
>>>>>>>>> it always happens on the second node...
>>>>>>>>> I don't see any error in logs, anything.
>>>>>>>>> But the reference count always increases on mount, and 
>>>>>>>>> doesn't  decrease on umount on this node..
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Apr 03, 2009 at 10:58:18AM +0800, Tao Ma wrote:
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>>>> Hi Nikola,
>>>>>>>>>>
>>>>>>>>>> Nikola Ciprich wrote:
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>>>>> Hi,
>>>>>>>>>>> I'm trying ocfs2 RHEL5 distro, 2.6.29 kernel,  
>>>>>>>>>>> ocfstools-1.4.1. I'm using DRBD in primary/primary mode
>>>>>>>>>>> as shared storage...
>>>>>>>>>>>
>>>>>>>>>>> I've configured the service according to quickstart  
>>>>>>>>>>> document, and everything works,
>>>>>>>>>>> but when I umount fs on both nodes, stopping o2cb 
>>>>>>>>>>> service on one of the nodes always
>>>>>>>>>>> fails with:
>>>>>>>>>>>
>>>>>>>>>>> [root at vbox4 sysconfig]# /etc/rc.d/init.d/o2cb stop
>>>>>>>>>>> Stopping O2CB cluster vb34: Failed
>>>>>>>>>>> Unable to stop cluster as heartbeat region still active
>>>>>>>>>>>
>>>>>>>>>>>                   
>>>>>>>>>>>                       
>>>>>>>>>> It looks that your disk heartbeat is still there. I don't know
>>>>>>>>>> the   specific reason, maybe
>>>>>>>>>> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1053 ?
>>>>>>>>>>
>>>>>>>>>> but you can stop it manually.
>>>>>>>>>> 1.  ocfs2_hb_ctl -I -d <device>
>>>>>>>>>> or ocfs2_hb_ctl -I -u <uuid>
>>>>>>>>>> this will tell you the reference number for the hearbeat.
>>>>>>>>>> 2.  ocfs2_hb_ctl -K -d <device> <service>
>>>>>>>>>>  or  ocfs2_hb_ctl -K -u <uuid> <service>
>>>>>>>>>> this will killed the heartbeat manually.
>>>>>>>>>> service is the stack you used, and it should be "o2cb" in 
>>>>>>>>>> your case.
>>>>>>>>>>
>>>>>>>>>> btw, you can try cfs2_hb_ctl -K -u <uuid> <service> to see
>>>>>>>>>> whether it is  the same problem as bug 1053.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Tao
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>           
>>>>>>>               
>>>>> -- 
>>>>> -------------------------------------
>>>>> Nikola CIPRICH
>>>>> LinuxBox.cz, s.r.o.
>>>>> 28. rijna 168, 709 01 Ostrava
>>>>>
>>>>> tel.:   +420 596 603 142
>>>>> fax:    +420 596 621 273
>>>>> mobil:  +420 777 093 799
>>>>> www.linuxbox.cz
>>>>>
>>>>> mobil servis: +420 737 238 656
>>>>> email servis: servis at linuxbox.cz
>>>>> -------------------------------------
>>>>>       
>>>>>           
>>>   
>>>       
>
>   




More information about the Ocfs2-users mailing list