[Ocfs2-devel] why oracle give up dlm by disk on ocfs2? because performance?

Srinivas Eeda srinivas.eeda at oracle.com
Wed Jul 3 09:42:10 PDT 2013


On 07/03/2013 01:28 AM, Jensen wrote:
> On 2013/7/3 13:06, Jeff Liu wrote:
>
>> On 07/03/2013 09:27 AM, Jensen wrote:
>>
>>> On 2013/7/3 1:20, Mark Fasheh wrote:
>>>
>>>> On Tue, Jul 02, 2013 at 10:07:52AM +0800, Jensen wrote:
>>>>> On 2013/7/2 9:35, Sunil Mushran wrote:
>>>>>
>>>>>> A general purpose file system requires one to manage over a million locks concurrently. So performance is the main reason.
>>>>>>
>>>>> Thanks for your comments.
>>>>>
>>>>> Has Oracle compared the performance between ocfs2 and ocfs1?
>>>> Firstly, that's implied in the answer you just got. Also, who wouldn't
>>>> compare performance from one version of a file system to the next?
>>>>
>>>> Can you please cut to the chase and either ask what you really want to know
>>>> or make the statement you're trying to make so we can move on?
>>>>
>>>
>>> Thanks for your answer.
>>> we want to use compare and write scsi command to replace DLM module. it is similar
>>> with vmware vmfs.
>>>
>> I'm not trying to answer this question.
>>
>> I knew that OCFS2 is deployed in HuaWei.com in a large-scale cluster up
>> to 128 nodes, so I'm not very much surprised at something you mentioned
>> below, but...
>>
>>> why we want to replace dlm in ocfs2? because:
>>> 1. The stability of ocfs2 dlm is very poor, we found 100+ bug.
>> That sounds interesting, how to classify those problems?
>> - Fatal error, panic
>> - Result in an interruption in service
>> - Wrong results, but can work around?
>> - Trivial
>>
>> Reporting bugs to bugzilla/OCFS2 would be useful to keep track of them:
>> https://oss.oracle.com/bugzilla/
>>
>
> Recently, Huawei has send many bug to open source community.
Yes and we appreciate your contributions :)
>   which bug exist in
> open source. other change or bug belong to enforce ocfs2 function ,for example, when disk
> timeout, open source code will reboot the machine, we modify it, we set the ocfs2
> invalid(can't read and write), open source may not interesting with those modify.
Can you please point me to these patches, I would be very interested to 
look at. If the patches are safe for all kind of workloads and guarantee 
to stop in-flight i/o's then it makes sense to add them to mainline ocfs2.
>
>> Thanks,
>> -Jeff
>>
>>> 2. The Reliability of ocfs2 dlm is very poor, especially in network split, the
>>>     worse case split two domain, the half of node must be reboot.
Currently we are looking into this. It appears that most of the network 
issues we are seeing may not really caused by network layer. They are 
mostly false errors because o2net thread got busy. Once this happens, in 
some scenarios reconnect always fails which is a bug. If you are 
currently running into these problems please share so we can analyse them.
>>> 3. the maximum number of mounted machine is 32, we want to support more.
>>>
>>> currently we worry about two thing:
>>> 1. The performance lock and unlock. because it use the scsi command and it is similar with IO read and write.
>>> 2. The change is very very large. because it maybe modify the disk layout of ocfs2.
I am just curious, have you already made these changes? Are the changes 
scalable?
>>>
>>> so anyone interested with this?
>>>
>>>> Thanks,
>>>> 	--Mark
>>>>
>>>> --
>>>> Mark Fasheh
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>>
>>
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel




More information about the Ocfs2-devel mailing list