[Ocfs2-users] OCFSv2 @ SLES9 SP3 (# 257) as a document storage - experiment results

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Fri Jul 7 20:15:48 CDT 2006


I did an experiment, using OCFSv2 as a doicument storage in development lab (doc storage is for the users of the product, not for developers, so we could tes - it had about 20 - 30 GB of small files and had a concurrent access from 2 servers).

Results:
- cluster died 2 times in 2 weeks (each time at night, by some reason).
- first failure resulted in totally damaged system disk on node-2 
- in both failures, 1 node (it runs SMP kernel) freeze, and second node died in a few hours. 
- after second failure, 1 directory had a bad file counter.
- no data loss in 2 weeks. 
- performance is good except some moments, when o2net spent 30 - 50% of all CPU power;


Most annoying things are:
- self-fencing. First of all, SLES9 have _dont reboot on panic_ default, so fencing just _freeze_ server.
   problem is that in 90% cases there was not any activity on file system, so it could just _remount_. 
   Other problem is that in many cases, I prefer to LOCK file system but dont reboot (example - OCFS2 used
   to store Oracle backups).
   
   Second - I have many servers around. I'd like to show few of them as _arbiters for OCFSv2_. But I dont like them to reboot
   in any case, and I do not need to mount file system on them.

  
- heartbeat. I use 2 ethernets + serial in Linux cluster; use 4 ethernets in PIX cluster, use 3 ethernet in Veritas cluster. Why OCFSv2 is
  so dumb that I cannot configure few IP for the server? It makes system very unstable (and if you remember about _self fencing_ - 
  makes it   unusable).

- symlink errors are reported into syslog - hmm, what an excellent idea -:). Why dont report any syscall error into the syslog (dont forget to purchase separate 200 GB disk before doing it).

So. It looks good, after old and broken implememntations. But it (still) do not work. And in many cases, it decrease reliability instead of increaing it (senf fencing of oracle backup file system is a good example).

Any ideas - may be, kernel 257 had a broiken version? In one case, I find root file system broken, and it looked as OCFSv2 wrote wrong buffers.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060707/210e8c9f/attachment-0001.html


More information about the Ocfs2-users mailing list