[Ocfs2-users] problem stopping o2cb service on one of nodes

Mon Apr 6 12:08:34 PDT 2009

AFAIK, this is not an issue on (rh)el4/sles9. It could be that the
enterprise distros never shipped that older version of udev.

I could cross check. Do you know the version of udev that started
creating the /dev/dm-xx devices?

Nikola Ciprich wrote:
> OK, I've got it...
> the problem is, that mounted.ocfs2 scans devices appearing in /proc/partitions, and only in directly under /dev
>
> but I'm using device mapper based storage, andd older versions of udev do not create all device mapper devices also in /dev/dm-XX, but only in /dev/mapper/... which is not therefore scanned by mounted.ocfs2
> the reason why it was working on one of my nodes is, that I've updated udev there some time ago for some other tests.
>
> so while updating udev is a workaround around the problem, I guess it might be good to fix in mounted.ocfs2, as people using older distros(especially enterprise ones) might stumble upon the problem if using device-mapper based storage...
>
> I can try to create a fix for this problem, trying to open dev under /dev/mapper if it's not found under /dev might be the way?
>
> Anyways Sunil thanks a lot for Your help!
>
> On Sun, Apr 05, 2009 at 07:31:52AM -0700, Sunil Mushran wrote:
>   
>> Email me the ouput of:
>> $ mounted.ocfs2 -d
>>
>> Also, does hb stop using uuid work?
>> $ ocfs2_hb_ctl -K -u <uuid> o2cb
>>
>> Lastly, what versions of the fs, tools, kernel?
>>
>> On Apr 4, 2009, at 1:24 AM, Nikola Ciprich <extmaillist at linuxbox.cz>  
>> wrote:
>>
>>     
>>> Hi,
>>> it says:
>>> /sbin/ocfs2_hb_ctl
>>> on both nodes, which's correct - the binary is there...
>>> n.
>>>
>>> On Fri, Apr 03, 2009 at 02:27:34PM -0700, Sunil Mushran wrote:
>>>       
>>>> Do:
>>>> $ cat /proc/sys/fs/ocfs2/nm/hb_ctl_path
>>>>
>>>>
>>>> Nikola Ciprich wrote:
>>>>         
>>>>> Hi Sunil,
>>>>> thanks for reply..
>>>>> I don't observe any segfaults...
>>>>> regarding info You want, as I wrote, umount doesn't decrease  
>>>>> refcount...:
>>>>>
>>>>> [root at vbox4 ~]# ocfs2_hb_ctl -I -d /dev/vgshared/lvs
>>>>> 2A5D351D0A934061BBC6B5392A30187E: 1 refs
>>>>> [root at vbox4 ~]# umount /home/LVS
>>>>> [root at vbox4 ~]# ocfs2_hb_ctl -I -d /dev/vgshared/lvs
>>>>> 2A5D351D0A934061BBC6B5392A30187E: 1 refs
>>>>>
>>>>> nik
>>>>>
>>>>> On Fri, Apr 03, 2009 at 10:21:33AM -0700, Sunil Mushran wrote:
>>>>>
>>>>>           
>>>>>> umount is supposed to stop the heartbeat. In bz1053, ocfs2_hb_ctl 
>>>>>> was
>>>>>> segfaulting.
>>>>>> Are you seeing any segfaults or any other errors during umount?
>>>>>>
>>>>>> Also, run the following before and after umount:
>>>>>> $ ocfs2_hb_ctl -I -d /dev/sdX o2cb
>>>>>>
>>>>>> Email me the output.
>>>>>>
>>>>>> Nikola Ciprich wrote:
>>>>>>
>>>>>>             
>>>>>>> Hello Tao,
>>>>>>> and thanks a lot for reply!
>>>>>>> It seems not to be the same bug, at least applying the patch  
>>>>>>> didn't help.
>>>>>>> stopping hb using -K parameter really helps, but why doesn't  
>>>>>>> this work automatically
>>>>>>> on umount?
>>>>>>> it always happens on the second node...
>>>>>>> I don't see any error in logs, anything.
>>>>>>> But the reference count always increases on mount, and doesn't  
>>>>>>> decrease on umount on this node..
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 03, 2009 at 10:58:18AM +0800, Tao Ma wrote:
>>>>>>>
>>>>>>>               
>>>>>>>> Hi Nikola,
>>>>>>>>
>>>>>>>> Nikola Ciprich wrote:
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Hi,
>>>>>>>>> I'm trying ocfs2 RHEL5 distro, 2.6.29 kernel, 
>>>>>>>>> ocfstools-1.4.1. I'm using DRBD in primary/primary mode
>>>>>>>>> as shared storage...
>>>>>>>>>
>>>>>>>>> I've configured the service according to quickstart 
>>>>>>>>> document, and everything works,
>>>>>>>>> but when I umount fs on both nodes, stopping o2cb service 
>>>>>>>>> on one of the nodes always
>>>>>>>>> fails with:
>>>>>>>>>
>>>>>>>>> [root at vbox4 sysconfig]# /etc/rc.d/init.d/o2cb stop
>>>>>>>>> Stopping O2CB cluster vb34: Failed
>>>>>>>>> Unable to stop cluster as heartbeat region still active
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> It looks that your disk heartbeat is still there. I don't know
>>>>>>>> the   specific reason, maybe
>>>>>>>> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1053 ?
>>>>>>>>
>>>>>>>> but you can stop it manually.
>>>>>>>> 1.  ocfs2_hb_ctl -I -d <device>
>>>>>>>> or ocfs2_hb_ctl -I -u <uuid>
>>>>>>>> this will tell you the reference number for the hearbeat.
>>>>>>>> 2.  ocfs2_hb_ctl -K -d <device> <service>
>>>>>>>>  or  ocfs2_hb_ctl -K -u <uuid> <service>
>>>>>>>> this will killed the heartbeat manually.
>>>>>>>> service is the stack you used, and it should be "o2cb" in 
>>>>>>>> your case.
>>>>>>>>
>>>>>>>> btw, you can try cfs2_hb_ctl -K -u <uuid> <service> to see
>>>>>>>> whether it is  the same problem as bug 1053.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Tao
>>>>>>>>
>>>>>>>>                 
>>>>>           
>>> -- 
>>> -------------------------------------
>>> Nikola CIPRICH
>>> LinuxBox.cz, s.r.o.
>>> 28. rijna 168, 709 01 Ostrava
>>>
>>> tel.:   +420 596 603 142
>>> fax:    +420 596 621 273
>>> mobil:  +420 777 093 799
>>> www.linuxbox.cz
>>>
>>> mobil servis: +420 737 238 656
>>> email servis: servis at linuxbox.cz
>>> -------------------------------------
>>>       
>
>