[Ocfs2-users] OCFS2 Fencing and Locking MSA500 Array: Help

Sunil Mushran Sunil.Mushran at oracle.com
Wed Oct 25 14:59:54 PDT 2006


Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000250
Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000250

That's where the problem begins. The cciss driver is unable to to 
complete the
ios due to a bus reset maybe. Ping HP or whoever your contact is for the 
MSA500.

You may get more information if you setup a netconsole server to catch the
stack dumps.

Deaderick, David (EDS) wrote:
> I have a RedHat Enterprise Linux 4.0 two node cluster on HP ProLiant
> ML350 Servers connected to an HP MSA500 with HP 532 SCSI adapters (cciss
> driver).
> The following list includes critical component versions:
> ocfs2console-1.2.1-1                          Mon 28 Aug 2006 05:39:20
> PM EDT
> ocfs2-2.6.9-42.0.2.ELsmp-1.2.3-1              Mon 28 Aug 2006 05:39:19
> PM EDT
> ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1          Mon 28 Aug 2006 05:39:18
> PM EDT
> ocfs2-2.6.9-42.0.2.EL-1.2.3-1                 Mon 28 Aug 2006 05:39:17
> PM EDT
> ocfs2-tools-1.2.1-1                           Mon 28 Aug 2006 05:39:15
> PM EDT
> oracleasmlib-2.0.2-1                          Mon 28 Aug 2006 05:37:51
> PM EDT
> oracleasm-2.6.9-42.0.2.ELhugemem-2.0.3-1      Mon 28 Aug 2006 05:37:49
> PM EDT
> oracleasm-2.6.9-42.0.2.EL-2.0.3-1             Mon 28 Aug 2006 05:37:47
> PM EDT
> oracleasm-2.6.9-42.0.2.ELsmp-2.0.3-1          Mon 28 Aug 2006 05:37:45
> PM EDT
> oracleasm-support-2.0.3-1                     Mon 28 Aug 2006 05:37:44
> PM EDT
> kernel-hugemem-2.6.9-42.0.2.EL                Mon 28 Aug 2006 05:25:32
> PM EDT
> kernel-doc-2.6.9-42.0.2.EL                    Mon 28 Aug 2006 05:25:29
> PM EDT
> kernel-hugemem-devel-2.6.9-42.0.2.EL          Mon 28 Aug 2006 05:25:07
> PM EDT
> kernel-smp-devel-2.6.9-42.0.2.EL              Mon 28 Aug 2006 05:21:45
> PM EDT
> kernel-smp-2.6.9-42.0.2.EL                    Mon 28 Aug 2006 05:20:51
> PM EDT
> kernel-utils-2.4-13.1.83                      Mon 28 Aug 2006 05:20:48
> PM EDT
> kernel-devel-2.6.9-42.0.2.EL                  Mon 28 Aug 2006 04:42:48
> PM EDT
> kernel-2.6.9-42.0.2.EL                        Mon 28 Aug 2006 04:42:37
> PM EDT
>
> When ever a heavy load is on the I/O system (i.e. database full backups
> using RMAN), the servers fence, reboot and cannot reconnect with the
> MSA500.
> We must power the servers and the MSA500 off and restart.
>
> Where can I start troubleshooting this?
>
> /var/log/messages: (Node 2)
>
> Oct 11 05:16:56 vhaispora02 kernel: o2net: connection to node
> vhaispora01 (num 0) at 192.168.1.1:7777 has been idle for 10 seconds,
> shutting it down.
> Oct 11 05:16:56 vhaispora02 kernel: (0,0):o2net_idle_timer:1309 here are
> some times that might help debug the situation: (tmr 1160558206.560358
> now 1160558216.558300 dr 1160558206.560323 adv
> 1160558206.560375:1160558206.560379 func (0d6da305:504)
> 1160552001.561116:1160552001.561125)
> Oct 11 05:16:56 vhaispora02 kernel: o2net: no longer connected to node
> vhaispora01 (num 0) at 192.168.1.1:7777
> Oct 11 05:16:59 vhaispora02 kernel: cciss0: unsolicited abort f7010e90
> Oct 11 05:16:59 vhaispora02 kernel: cciss0: retrying f7010e90
> .
> .
> .
> Oct 11 05:17:18 vhaispora02 kernel: cciss0: f7010550 retried too many
> times
> Oct 11 05:17:18 vhaispora02 kernel: cciss0: unsolicited abort f70107a0
> Oct 11 05:17:18 vhaispora02 kernel: cciss0: f70107a0 retried too many
> times
> Oct 11 05:17:18 vhaispora02 kernel: cciss0: unsolicited abort f70109f0
> Oct 11 10:35:57 vhaispora02 syslogd 1.4.1: restart.
> Oct 11 10:35:57 vhaispora02 syslog: syslogd startup succeeded
> Oct 11 10:35:57 vhaispora02 kernel: klogd 1.4.1, log source = /proc/kmsg
> started.
> Oct 11 10:35:57 vhaispora02 kernel: Linux version 2.6.9-42.0.2.ELsmp
> (bhcompile at ls20-bc1-13.build.redhat.com) (gcc version 3.4.6 20060404
> (Red Hat 3.4.6-3)) #1 SMP Thu Aug 17 18:00:32 EDT 2006
>
> /var/log/messages (Node 1)
> Oct 11 05:10:01 vhaispora01 crond(pam_unix)[14577]: session closed for
> user root
> Oct 11 05:14:25 vhaispora01 ntpd[3243]: synchronized to 10.4.31.254,
> stratum 2
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000250
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000250
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f70004a0
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f70004a0
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f70006f0
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f70006f0
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000940
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000940
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000b90
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000b90
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000de0
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000de0
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7001030
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7001030
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7001280
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7001280
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f70014d0
> Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f70014d0
> .
> .
> .
> Oct 11 05:16:46 vhaispora01 kernel: cciss0: unsolicited abort f7012ca0
> Oct 11 05:16:46 vhaispora01 kernel: cciss0: f7012ca0 retried too many
> times
> Oct 11 05:16:47 vhaispora01 kernel: cciss0: unsolicited abort f7012ef0
> Oct 11 05:16:47 vhaispora01 kernel: cciss0: f7012ef0 retried too many
> times
> Oct 11 10:35:50 vhaispora01 syslogd 1.4.1: restart.
> Oct 11 10:35:50 vhaispora01 syslog: syslogd startup succeeded
> Oct 11 10:35:50 vhaispora01 kernel: klogd 1.4.1, log source = /proc/kmsg
> started.
> Oct 11 10:35:50 vhaispora01 kernel: Linux version 2.6.9-42.0.2.ELsmp
> (bhcompile at ls20-bc1-13.build.redhat.com) (gcc version 3.4.6 20060404
> (Red Hat 3.4.6-3)) #1 SMP Thu Aug 17 18:00:32 EDT 2006
> O
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   



More information about the Ocfs2-users mailing list