[Ocfs-users] Hard restart of the nodes after loosing connection

Tue Sep 13 22:08:15 PDT 2011

Hello,
I have a strange thing with my OCFS2 system.
I have two servers with Debian squeeze on one and Debian sid on another.
Both of them have 2.6.32 kernel. Ocfs2-tools 1.6.3-2 installed on both.
They'r connected with iSCSI interface and open-iscsi to HP StorageWorks MSA
2012 using one controller of it.
My storage system experiences problems with power supply unit and it
restarts sometime (it begun recently and I'm working on fixing it). When it
hangs and restart, two of my OCFS2 nodes, connected to storage system,
restart too.
I think it's not right - I can loose my data, because these are production
servers.

That's what I caught in console when node gone restarting:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.910077] general protection fault: 0000 [#1] SMP

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.910118] last sysfs file: /sys/fs/o2cb/interface_revision

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.911397] Stack:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.911603] Call Trace:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.911948] Code: fa 66 0f 1f 44 00 00 65 8b 04 25 a8 e3 00 00 48
98 49 8b 94 c4 f8 02 00 00 8b 4a 18 89 4c 24 14 48 8b 1a 48 85 db 74 0c 8b
42 14 <48> 8b 04 c3 48 89 02 eb 1d 48 8b 4c 24 08 49 89 d0 89 ee 83 ca

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.289837] general protection fault: 0000 [#2] SMP

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.289978] last sysfs file: /sys/fs/o2cb/interface_revision

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.295417] Stack:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.296222] Call Trace:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.296828] Code: fa 66 0f 1f 44 00 00 65 8b 04 25 a8 e3 00 00 48
98 49 8b 94 c4 f8 02 00 00 8b 4a 18 89 4c 24 14 48 8b 1a 48 85 db 74 0c 8b
42 14 <48> 8b 04 c3 48 89 02 eb 1d 48 8b 4c 24 08 49 89 d0 89 ee 83 ca

I'd just like to have an advice, what I can do with OCFS configuration to
prevent system restarts caused by storage problems.

Thanks for any help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs-users/attachments/20110914/e40d1cba/attachment.html