[Ocfs2-users] any way to ignore quorum in a two node cluster with one node down?

Sunil Mushran Sunil.Mushran at oracle.com
Fri Jul 20 10:10:01 PDT 2007


There are dlm bug fixes that should already be in SLES10 SP1.

The change for what you are looking for was not trivial and
merging it in the 1.2 tree will not be either.

Some of the patches related to this are as follows:

Date:   Fri Sep 8 11:37:32 2006 -0700

commit 3384f3df5ed939a25135e1b2734fb7cdee1720a8
ocfs2: Allow binary names in the DLM

commit ea5b3a187e2724fa9d08b2fbd3898c149ed95c6b
ocfs2: Update dlmfs for new dlmlock() API

commit f0681062b8e369d9fb6f3ce10f4e3fc8cea5f910
ocfs2: Update dlmglue for new dlmlock() API

commit d680efe9d8fe0eb99d9dd063a4def6b362cdb40d
ocfs2: Add new cluster lock type

commit 80c05846f604bab6d61e9732c262420ee9f5f358
ocfs2: Add dentry tracking API

commit 379dfe9d0db99ed33fb089fcb9c07f5f92566e9e
ocfs2: Hook rest of the file system into dentry locking API

commit 1390334b4c697b7588d5661fcf6acaeec409cf4c
ocfs2: Remove the dentry vote

commit 1ba9da2ffa54b56a6346746248bfa38124d499a6
ocfs2: manually d_move() during ocfs2_rename()

Needless to add, this changes the dlm protocol.

Andrew D. Ball wrote:
> Can you help me determine which patches need to be backported to SLES 10
> SP1 to resolve this?  Someone in the LTC saw the following that looked
> promising:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2f5bf1f2d061dea5146aa283685ce2b00cea2f3d
>   
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=78062cb2e54ffe0df811dce5e68b54da9b8c9025
>   
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b36c3f84988eebf38acaccc756e05f6b70e333ab
>   
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3fca0894a4b5e52c278421b04435b88e32b423ad
>   
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0dd82141b236ce36253e3056c6068ee3d5732196
>   
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e4968476a9bc5a6b30076076b4f3ce3e692e0d79
>   
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f3f854648de64c4b6f13f6f13113bc9525c621e5
>   
>
> Thanks!
> Andrew
>
> On Tue, 2007-07-17 at 16:17 -0700, Sunil Mushran wrote:
>   
>> Ahh... this is a 1.2 "feature". :)
>>
>> In 1.2, the fs does messaging (votes) for mount/umount, rename,
>> unlink and delete. So the said operations can fail... if a node dies
>> during the voting process.
>>
>> We have addressed this issue in mainline. Sometime in 2.6.18/19.
>> As in, rename, unlink and delete now use the dlm and thus should
>> no longer fail on a node death.
>>
>> Andrew D. Ball wrote:
>>     
>>> >From /var/log/messages:
>>>
>>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_broadcast_vote:725 ERROR:
>>> status 
>>> = -107
>>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_do_request_vote:798
>>> ERROR: status
>>>  = -107
>>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_rename:1196 ERROR: status
>>> = -107
>>>
>>> The userspace error is a failed invocation of the mv command.  I know
>>> that the return code is not 0, but didn't capture it.  I can re-run
>>> tomorrow if that would be helpful.
>>>
>>> Thanks for your prompt response!
>>>
>>> Peace,
>>> Andrew
>>>
>>> On Tue, 2007-07-17 at 15:31 -0700, Sunil Mushran wrote:
>>>   
>>>       
>>>> It should behavw as you expect it to. That's the idea.
>>>> What are the errors when mkdir fails?
>>>> As in, userspace and dmesg.
>>>>
>>>> Andrew D. Ball wrote:
>>>>     
>>>>         
>>>>> I would really like to see the following behavior:
>>>>>
>>>>> (1) I start with a two-node cluster, both nodes online, with an ocfs2
>>>>> filesystem mounted on both nodes.
>>>>> (2) I power off one of the nodes without unmounting the filesystem.
>>>>> (3) The node that is still powered on continues to use the filesystem
>>>>> mounted read-write with no problems.
>>>>>
>>>>> I believe I'm seeing that the node that is still online fails to write
>>>>> data to the filesystem.  Specifically, mkdir(2) is failing.
>>>>>
>>>>> This is related to having a quorum right?  Can the quorum requirements
>>>>> be disabled?  I have a file-backed database on the filesystem and my
>>>>> entire software stack will be broken if any surviving nodes cannot
>>>>> update the database.  Is there any reason why ignoring the quorum would
>>>>> be not a good idea?
>>>>>
>>>>> Thanks for your help,
>>>>> Andrew
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>   
>>>>>       
>>>>>           
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>   
>>>       
>
>   




More information about the Ocfs2-users mailing list