[Ocfs2-users] Cluster setup

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Wed Oct 10 19:12:37 PDT 2007


I can't log _single heartbeat interface_ as a bug - but it is a problem.
I can't log _fence himself if disk access is delayed on all nodes for 30 
seconds_ as a bug - but it is a problem.
If SAN system is not accessible from ALL Nodes, cluster should wait until at 
least 1 node get access, and do not fence until it happened.
I can't log _when SAN swithc reboot, all servers survived easily except OCFS 
nodes which all rebooted - it is by design.
Memory leak with file creating is known and fixed.

Main concern is fencing. If I use OCFSv2 for backups (runing between 2 and 3 
am at night) and
if OCFSv2 cluster fence himself and reboot at 4pm becaause it lost heartbeat 
(even if we have not single open file on OCFSv2 this time) - it is a bug for 
me, ebven if it is not a bug for OCFSv2 developers.

Here is a difference, I am concerned about practical design, and I dont like 
what I can see. Design is unreliable. I use OCFSv2 to store home
on 1 heartbeat cluster, just to watch how it behave - and it is really 
system killer in any failure... IT works now (after few patches)
but I can't count on it as on a reliable application. Reasons - see above (I 
can count on Veritas, can count on heartbeat, can count on Cisco pix, and on 
may other cluster applications or hardware - but not OCFSv2. Btw, RAC 
cluster is not reliable thing as well - it is just good performance 
improvement and some short term reliability thing but not a real reliable 
cluster. But it is obvious for RAC /too many data are shared/ so not a 
surprise.).

I hope to find time (when we finish one big project) and look onto the 
modern OCFSv2 version (may be, try to figure out how to fix some of this 
problems) but for now, I positioned it in our company as _for non critical 
use only_. And I have a few confirmations that I was right.

----- Original Message ----- 
From: "Sunil Mushran" <Sunil.Mushran at oracle.com>
To: "Sunil Mushran" <Sunil.Mushran at oracle.com>
Cc: "Alexei_Roudnev" <Alexei_Roudnev at exigengroup.com>; 
<ocfs2-users at oss.oracle.com>
Sent: Wednesday, October 10, 2007 6:43 PM
Subject: Re: [Ocfs2-users] Cluster setup


> BTW, how many bugs have you logged? The least you can do is just that.
> Whining will not get you anywhere.
>
> Sunil Mushran wrote:
>> Alexei_Roudnev wrote:
>>> Does Oracle tests behavior of OCFSv2 in case of:
>>> - 1,000 different users;
>> YES. By users I assume you mean processes. 500-1000 per node
>> on a 8 node cluster.
>>
>>> - host1 appends to the file and host2 truncate it; then host3 rename 
>>> file.
>> YES
>>
>>> - file is removed on node1 but still open on node2;
>> YES
>>
>>> - one node creates file and other try to rename another file into it.
>> YES
>>
>>> - 900 GB file sytem and we create 1,000,000 directories with 5,000,000 
>>> files in it
>> No, we limit 32000 files in a directory. Doc in the FAQ.
>>
>>> - directory with 100,000 files inside
>> No. See above.
>>
>>> - file name length 512 symbols in UTF8 encoding
>> No, as charactersets are not handled by the fs. But knock yourself out.
>>
>>> - we run 'mkdir x; rmdir x' in a loop for a week...
>> What so magical about a week. Again, knock yourself out.
>>
>>> etc etc...
>>> ??
>>>  I am more concerned about OCFSv2 usage as a common file system, and not 
>>> so many about LVM + OCFSv2. OCFSv2 looks pretty stable when used for 
>>> limited ## of files and limited usage scenario (such as Oracle usage) 
>>> but not as a common file system used by thousand of students with 
>>> unlimited fantasy...
>> No, it does not make your morning coffee.
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
> 




More information about the Ocfs2-users mailing list