[Ocfs2-users] problem stopping o2cb service on one of nodes

Nikola Ciprich extmaillist at linuxbox.cz
Sun Apr 5 09:38:23 PDT 2009


OK, I've got it...
the problem is, that mounted.ocfs2 scans devices appearing in /proc/partitions, and only in directly under /dev

but I'm using device mapper based storage, andd older versions of udev do not create all device mapper devices also in /dev/dm-XX, but only in /dev/mapper/... which is not therefore scanned by mounted.ocfs2
the reason why it was working on one of my nodes is, that I've updated udev there some time ago for some other tests.

so while updating udev is a workaround around the problem, I guess it might be good to fix in mounted.ocfs2, as people using older distros(especially enterprise ones) might stumble upon the problem if using device-mapper based storage...

I can try to create a fix for this problem, trying to open dev under /dev/mapper if it's not found under /dev might be the way?

Anyways Sunil thanks a lot for Your help!

On Sun, Apr 05, 2009 at 07:31:52AM -0700, Sunil Mushran wrote:
> Email me the ouput of:
> $ mounted.ocfs2 -d
>
> Also, does hb stop using uuid work?
> $ ocfs2_hb_ctl -K -u <uuid> o2cb
>
> Lastly, what versions of the fs, tools, kernel?
>
> On Apr 4, 2009, at 1:24 AM, Nikola Ciprich <extmaillist at linuxbox.cz>  
> wrote:
>
>> Hi,
>> it says:
>> /sbin/ocfs2_hb_ctl
>> on both nodes, which's correct - the binary is there...
>> n.
>>
>> On Fri, Apr 03, 2009 at 02:27:34PM -0700, Sunil Mushran wrote:
>>> Do:
>>> $ cat /proc/sys/fs/ocfs2/nm/hb_ctl_path
>>>
>>>
>>> Nikola Ciprich wrote:
>>>> Hi Sunil,
>>>> thanks for reply..
>>>> I don't observe any segfaults...
>>>> regarding info You want, as I wrote, umount doesn't decrease  
>>>> refcount...:
>>>>
>>>> [root at vbox4 ~]# ocfs2_hb_ctl -I -d /dev/vgshared/lvs
>>>> 2A5D351D0A934061BBC6B5392A30187E: 1 refs
>>>> [root at vbox4 ~]# umount /home/LVS
>>>> [root at vbox4 ~]# ocfs2_hb_ctl -I -d /dev/vgshared/lvs
>>>> 2A5D351D0A934061BBC6B5392A30187E: 1 refs
>>>>
>>>> nik
>>>>
>>>> On Fri, Apr 03, 2009 at 10:21:33AM -0700, Sunil Mushran wrote:
>>>>
>>>>> umount is supposed to stop the heartbeat. In bz1053, ocfs2_hb_ctl 
>>>>> was
>>>>> segfaulting.
>>>>> Are you seeing any segfaults or any other errors during umount?
>>>>>
>>>>> Also, run the following before and after umount:
>>>>> $ ocfs2_hb_ctl -I -d /dev/sdX o2cb
>>>>>
>>>>> Email me the output.
>>>>>
>>>>> Nikola Ciprich wrote:
>>>>>
>>>>>> Hello Tao,
>>>>>> and thanks a lot for reply!
>>>>>> It seems not to be the same bug, at least applying the patch  
>>>>>> didn't help.
>>>>>> stopping hb using -K parameter really helps, but why doesn't  
>>>>>> this work automatically
>>>>>> on umount?
>>>>>> it always happens on the second node...
>>>>>> I don't see any error in logs, anything.
>>>>>> But the reference count always increases on mount, and doesn't  
>>>>>> decrease on umount on this node..
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 03, 2009 at 10:58:18AM +0800, Tao Ma wrote:
>>>>>>
>>>>>>> Hi Nikola,
>>>>>>>
>>>>>>> Nikola Ciprich wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I'm trying ocfs2 RHEL5 distro, 2.6.29 kernel, 
>>>>>>>> ocfstools-1.4.1. I'm using DRBD in primary/primary mode
>>>>>>>> as shared storage...
>>>>>>>>
>>>>>>>> I've configured the service according to quickstart 
>>>>>>>> document, and everything works,
>>>>>>>> but when I umount fs on both nodes, stopping o2cb service 
>>>>>>>> on one of the nodes always
>>>>>>>> fails with:
>>>>>>>>
>>>>>>>> [root at vbox4 sysconfig]# /etc/rc.d/init.d/o2cb stop
>>>>>>>> Stopping O2CB cluster vb34: Failed
>>>>>>>> Unable to stop cluster as heartbeat region still active
>>>>>>>>
>>>>>>> It looks that your disk heartbeat is still there. I don't know
>>>>>>> the   specific reason, maybe
>>>>>>> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1053 ?
>>>>>>>
>>>>>>> but you can stop it manually.
>>>>>>> 1.  ocfs2_hb_ctl -I -d <device>
>>>>>>> or ocfs2_hb_ctl -I -u <uuid>
>>>>>>> this will tell you the reference number for the hearbeat.
>>>>>>> 2.  ocfs2_hb_ctl -K -d <device> <service>
>>>>>>>  or  ocfs2_hb_ctl -K -u <uuid> <service>
>>>>>>> this will killed the heartbeat manually.
>>>>>>> service is the stack you used, and it should be "o2cb" in 
>>>>>>> your case.
>>>>>>>
>>>>>>> btw, you can try cfs2_hb_ctl -K -u <uuid> <service> to see
>>>>>>> whether it is  the same problem as bug 1053.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Tao
>>>>>>>
>>>>
>>>>
>>>
>>
>> -- 
>> -------------------------------------
>> Nikola CIPRICH
>> LinuxBox.cz, s.r.o.
>> 28. rijna 168, 709 01 Ostrava
>>
>> tel.:   +420 596 603 142
>> fax:    +420 596 621 273
>> mobil:  +420 777 093 799
>> www.linuxbox.cz
>>
>> mobil servis: +420 737 238 656
>> email servis: servis at linuxbox.cz
>> -------------------------------------
>

-- 
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis at linuxbox.cz
-------------------------------------



More information about the Ocfs2-users mailing list