[Ocfs2-users] any way to ignore quorum in a two node cluster with one node down?

Sunil Mushran Sunil.Mushran at oracle.com
Mon Jul 23 13:37:08 PDT 2007


Yes, the failure is temporary.

Andrew D. Ball wrote:
> On Tue, 2007-07-17 at 16:17 -0700, Sunil Mushran wrote:
>   
>> Ahh... this is a 1.2 "feature". :)
>>
>> In 1.2, the fs does messaging (votes) for mount/umount, rename,
>> unlink and delete. So the said operations can fail... if a node dies
>> during the voting process.
>>
>> We have addressed this issue in mainline. Sometime in 2.6.18/19.
>> As in, rename, unlink and delete now use the dlm and thus should
>> no longer fail on a node death.
>>
>>     
>
> Should these start working again when retried after some amount of time?
> I can't have them fail forever, and if mount/unmount don't work, that
> would likely make it very hard to recover.
>
> Peace,
> Andrew
>
>   
>> Andrew D. Ball wrote:
>>     
>>> >From /var/log/messages:
>>>
>>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_broadcast_vote:725 ERROR:
>>> status 
>>> = -107
>>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_do_request_vote:798
>>> ERROR: status
>>>  = -107
>>> Jul 16 17:06:43 enva11 kernel: (29933,2):ocfs2_rename:1196 ERROR: status
>>> = -107
>>>
>>> The userspace error is a failed invocation of the mv command.  I know
>>> that the return code is not 0, but didn't capture it.  I can re-run
>>> tomorrow if that would be helpful.
>>>
>>> Thanks for your prompt response!
>>>
>>> Peace,
>>> Andrew
>>>
>>> On Tue, 2007-07-17 at 15:31 -0700, Sunil Mushran wrote:
>>>   
>>>       
>>>> It should behavw as you expect it to. That's the idea.
>>>> What are the errors when mkdir fails?
>>>> As in, userspace and dmesg.
>>>>
>>>> Andrew D. Ball wrote:
>>>>     
>>>>         
>>>>> I would really like to see the following behavior:
>>>>>
>>>>> (1) I start with a two-node cluster, both nodes online, with an ocfs2
>>>>> filesystem mounted on both nodes.
>>>>> (2) I power off one of the nodes without unmounting the filesystem.
>>>>> (3) The node that is still powered on continues to use the filesystem
>>>>> mounted read-write with no problems.
>>>>>
>>>>> I believe I'm seeing that the node that is still online fails to write
>>>>> data to the filesystem.  Specifically, mkdir(2) is failing.
>>>>>
>>>>> This is related to having a quorum right?  Can the quorum requirements
>>>>> be disabled?  I have a file-backed database on the filesystem and my
>>>>> entire software stack will be broken if any surviving nodes cannot
>>>>> update the database.  Is there any reason why ignoring the quorum would
>>>>> be not a good idea?
>>>>>
>>>>> Thanks for your help,
>>>>> Andrew
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>   
>>>>>       
>>>>>           
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>   
>>>       
>
>   




More information about the Ocfs2-users mailing list