[Ocfs2-users] any way to ignore quorum in a two node cluster with one node down?

Sunil Mushran sunil.mushran at oracle.com
Sat Jul 21 15:44:17 PDT 2007


No. Adding a node will not help.It's a node death during the said
operation that is causing the temporary failure.

I am curious as to why is this such "big" issue for you. Can you elaborate
on that?

As this has been addressed in mainline, it will work as expected in 1.4.

Andrew D. Ball wrote:
> Given that the change is non-trivial, would requiring at least three
> nodes to be in the ocfs2 be a successful workaround?
>
> Thanks for your help,
> Andrew
>
> On Fri, 2007-07-20 at 10:10 -0700, Sunil Mushran wrote:
>   
>> There are dlm bug fixes that should already be in SLES10 SP1.
>>
>> The change for what you are looking for was not trivial and
>> merging it in the 1.2 tree will not be either.
>>
>> Some of the patches related to this are as follows:
>>
>> Date:   Fri Sep 8 11:37:32 2006 -0700
>>
>> commit 3384f3df5ed939a25135e1b2734fb7cdee1720a8
>> ocfs2: Allow binary names in the DLM
>>
>> commit ea5b3a187e2724fa9d08b2fbd3898c149ed95c6b
>> ocfs2: Update dlmfs for new dlmlock() API
>>
>> commit f0681062b8e369d9fb6f3ce10f4e3fc8cea5f910
>> ocfs2: Update dlmglue for new dlmlock() API
>>
>> commit d680efe9d8fe0eb99d9dd063a4def6b362cdb40d
>> ocfs2: Add new cluster lock type
>>
>> commit 80c05846f604bab6d61e9732c262420ee9f5f358
>> ocfs2: Add dentry tracking API
>>
>> commit 379dfe9d0db99ed33fb089fcb9c07f5f92566e9e
>> ocfs2: Hook rest of the file system into dentry locking API
>>
>> commit 1390334b4c697b7588d5661fcf6acaeec409cf4c
>> ocfs2: Remove the dentry vote
>>
>> commit 1ba9da2ffa54b56a6346746248bfa38124d499a6
>> ocfs2: manually d_move() during ocfs2_rename()
>>
>> Needless to add, this changes the dlm protocol.
>>
>> Andrew D. Ball wrote:
>>     
>>> Can you help me determine which patches need to be backported to SLES 10
>>> SP1 to resolve this?  Someone in the LTC saw the following that looked
>>> promising:
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2f5bf1f2d061dea5146aa283685ce2b00cea2f3d
>>>   
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=78062cb2e54ffe0df811dce5e68b54da9b8c9025
>>>   
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b36c3f84988eebf38acaccc756e05f6b70e333ab
>>>   
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3fca0894a4b5e52c278421b04435b88e32b423ad
>>>   
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0dd82141b236ce36253e3056c6068ee3d5732196
>>>   
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e4968476a9bc5a6b30076076b4f3ce3e692e0d79
>>>   
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f3f854648de64c4b6f13f6f13113bc9525c621e5
>>>   
>>>
>>> Thanks!
>>> Andrew
>>>
>>> On Tue, 2007-07-17 at 16:17 -0700, Sunil Mushran wrote:
>>>   
>>>       
>>>> Ahh... this is a 1.2 "feature". :)
>>>>
>>>> In 1.2, the fs does messaging (votes) for mount/umount, rename,
>>>> unlink and delete. So the said operations can fail... if a node dies
>>>> during the voting process.
>>>>
>>>> We have addressed this issue in mainline. Sometime in 2.6.18/19.
>>>> As in, rename, unlink and delete now use the dlm and thus should
>>>> no longer fail on a node death.
>>>>
>>>> Andrew D. Ball wrote:
>>>>     
>>>>         
>>>>> >From /var/log/messages:
>>>>>
>>>>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_broadcast_vote:725 ERROR:
>>>>> status 
>>>>> = -107
>>>>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_do_request_vote:798
>>>>> ERROR: status
>>>>>  = -107
>>>>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_rename:1196 ERROR: status
>>>>> = -107
>>>>>
>>>>> The userspace error is a failed invocation of the mv command.  I know
>>>>> that the return code is not 0, but didn't capture it.  I can re-run
>>>>> tomorrow if that would be helpful.
>>>>>
>>>>> Thanks for your prompt response!
>>>>>
>>>>> Peace,
>>>>> Andrew
>>>>>
>>>>> On Tue, 2007-07-17 at 15:31 -0700, Sunil Mushran wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> It should behavw as you expect it to. That's the idea.
>>>>>> What are the errors when mkdir fails?
>>>>>> As in, userspace and dmesg.
>>>>>>
>>>>>> Andrew D. Ball wrote:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> I would really like to see the following behavior:
>>>>>>>
>>>>>>> (1) I start with a two-node cluster, both nodes online, with an ocfs2
>>>>>>> filesystem mounted on both nodes.
>>>>>>> (2) I power off one of the nodes without unmounting the filesystem.
>>>>>>> (3) The node that is still powered on continues to use the filesystem
>>>>>>> mounted read-write with no problems.
>>>>>>>
>>>>>>> I believe I'm seeing that the node that is still online fails to write
>>>>>>> data to the filesystem.  Specifically, mkdir(2) is failing.
>>>>>>>
>>>>>>> This is related to having a quorum right?  Can the quorum requirements
>>>>>>> be disabled?  I have a file-backed database on the filesystem and my
>>>>>>> entire software stack will be broken if any surviving nodes cannot
>>>>>>> update the database.  Is there any reason why ignoring the quorum would
>>>>>>> be not a good idea?
>>>>>>>
>>>>>>> Thanks for your help,
>>>>>>> Andrew
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ocfs2-users mailing list
>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>   
>>>>>       
>>>>>           
>>>   
>>>       
>
>   



More information about the Ocfs2-users mailing list