[Ocfs2-users] any way to ignore quorum in a two node cluster with one node down?

Andrew D. Ball aball at linux.vnet.ibm.com
Wed Jul 25 11:06:15 PDT 2007


Can you help estimate an upper bound on how long a failure like this
could last?

Thanks for your help.
Andrew

On Mon, 2007-07-23 at 13:37 -0700, Sunil Mushran wrote:
> Yes, the failure is temporary.
> 
> Andrew D. Ball wrote:
> > On Tue, 2007-07-17 at 16:17 -0700, Sunil Mushran wrote:
> >   
> >> Ahh... this is a 1.2 "feature". :)
> >>
> >> In 1.2, the fs does messaging (votes) for mount/umount, rename,
> >> unlink and delete. So the said operations can fail... if a node dies
> >> during the voting process.
> >>
> >> We have addressed this issue in mainline. Sometime in 2.6.18/19.
> >> As in, rename, unlink and delete now use the dlm and thus should
> >> no longer fail on a node death.
> >>
> >>     
> >
> > Should these start working again when retried after some amount of time?
> > I can't have them fail forever, and if mount/unmount don't work, that
> > would likely make it very hard to recover.
> >
> > Peace,
> > Andrew
> >
> >   
> >> Andrew D. Ball wrote:
> >>     
> >>> >From /var/log/messages:
> >>>
> >>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_broadcast_vote:725 ERROR:
> >>> status 
> >>> = -107
> >>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_do_request_vote:798
> >>> ERROR: status
> >>>  = -107
> >>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_rename:1196 ERROR: status
> >>> = -107
> >>>
> >>> The userspace error is a failed invocation of the mv command.  I know
> >>> that the return code is not 0, but didn't capture it.  I can re-run
> >>> tomorrow if that would be helpful.
> >>>
> >>> Thanks for your prompt response!
> >>>
> >>> Peace,
> >>> Andrew
> >>>
> >>> On Tue, 2007-07-17 at 15:31 -0700, Sunil Mushran wrote:
> >>>   
> >>>       
> >>>> It should behavw as you expect it to. That's the idea.
> >>>> What are the errors when mkdir fails?
> >>>> As in, userspace and dmesg.
> >>>>
> >>>> Andrew D. Ball wrote:
> >>>>     
> >>>>         
> >>>>> I would really like to see the following behavior:
> >>>>>
> >>>>> (1) I start with a two-node cluster, both nodes online, with an ocfs2
> >>>>> filesystem mounted on both nodes.
> >>>>> (2) I power off one of the nodes without unmounting the filesystem.
> >>>>> (3) The node that is still powered on continues to use the filesystem
> >>>>> mounted read-write with no problems.
> >>>>>
> >>>>> I believe I'm seeing that the node that is still online fails to write
> >>>>> data to the filesystem.  Specifically, mkdir(2) is failing.
> >>>>>
> >>>>> This is related to having a quorum right?  Can the quorum requirements
> >>>>> be disabled?  I have a file-backed database on the filesystem and my
> >>>>> entire software stack will be broken if any surviving nodes cannot
> >>>>> update the database.  Is there any reason why ignoring the quorum would
> >>>>> be not a good idea?
> >>>>>
> >>>>> Thanks for your help,
> >>>>> Andrew
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Ocfs2-users mailing list
> >>>>> Ocfs2-users at oss.oracle.com
> >>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>>>>   
> >>>>>       
> >>>>>           
> >>> _______________________________________________
> >>> Ocfs2-users mailing list
> >>> Ocfs2-users at oss.oracle.com
> >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>>   
> >>>       
> >
> >   
> 




More information about the Ocfs2-users mailing list