[Ocfs2-users] any way to ignore quorum in a two node cluster with one node down?

Fri Jul 20 07:10:42 PDT 2007

Can you help me determine which patches need to be backported to SLES 10
SP1 to resolve this?  Someone in the LTC saw the following that looked
promising:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2f5bf1f2d061dea5146aa283685ce2b00cea2f3d
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=78062cb2e54ffe0df811dce5e68b54da9b8c9025
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b36c3f84988eebf38acaccc756e05f6b70e333ab
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3fca0894a4b5e52c278421b04435b88e32b423ad
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0dd82141b236ce36253e3056c6068ee3d5732196
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e4968476a9bc5a6b30076076b4f3ce3e692e0d79
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f3f854648de64c4b6f13f6f13113bc9525c621e5
> 

Thanks!
Andrew

On Tue, 2007-07-17 at 16:17 -0700, Sunil Mushran wrote:
> Ahh... this is a 1.2 "feature". :)
> 
> In 1.2, the fs does messaging (votes) for mount/umount, rename,
> unlink and delete. So the said operations can fail... if a node dies
> during the voting process.
> 
> We have addressed this issue in mainline. Sometime in 2.6.18/19.
> As in, rename, unlink and delete now use the dlm and thus should
> no longer fail on a node death.
> 
> Andrew D. Ball wrote:
> > >From /var/log/messages:
> >
> > Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_broadcast_vote:725 ERROR:
> > status 
> > = -107
> > Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_do_request_vote:798
> > ERROR: status
> >  = -107
> > Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_rename:1196 ERROR: status
> > = -107
> >
> > The userspace error is a failed invocation of the mv command.  I know
> > that the return code is not 0, but didn't capture it.  I can re-run
> > tomorrow if that would be helpful.
> >
> > Thanks for your prompt response!
> >
> > Peace,
> > Andrew
> >
> > On Tue, 2007-07-17 at 15:31 -0700, Sunil Mushran wrote:
> >   
> >> It should behavw as you expect it to. That's the idea.
> >> What are the errors when mkdir fails?
> >> As in, userspace and dmesg.
> >>
> >> Andrew D. Ball wrote:
> >>     
> >>> I would really like to see the following behavior:
> >>>
> >>> (1) I start with a two-node cluster, both nodes online, with an ocfs2
> >>> filesystem mounted on both nodes.
> >>> (2) I power off one of the nodes without unmounting the filesystem.
> >>> (3) The node that is still powered on continues to use the filesystem
> >>> mounted read-write with no problems.
> >>>
> >>> I believe I'm seeing that the node that is still online fails to write
> >>> data to the filesystem.  Specifically, mkdir(2) is failing.
> >>>
> >>> This is related to having a quorum right?  Can the quorum requirements
> >>> be disabled?  I have a file-backed database on the filesystem and my
> >>> entire software stack will be broken if any surviving nodes cannot
> >>> update the database.  Is there any reason why ignoring the quorum would
> >>> be not a good idea?
> >>>
> >>> Thanks for your help,
> >>> Andrew
> >>>
> >>>
> >>> _______________________________________________
> >>> Ocfs2-users mailing list
> >>> Ocfs2-users at oss.oracle.com
> >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>>   
> >>>       
> >
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >   
>