[Ocfs2-users] ocfs2 issue? : unexplained reboots of RHEL 4 server (kernel:2.6.9-42.0.2.ELs)

Derek Hazell derek.hazell at gmail.com
Sun Aug 17 20:59:28 PDT 2008


Dear OCFS2 forum

We run ocfs2 version 1.2.9-1 as an ocfs2 cluster on four Linux servers
running RHEL 4 (kernel: 2.6.9-42.0.2.ELs)

We are getting unexpected reboots of one of the Linux servers and are
wondering if the reboots are related to ocfs2 or not.
We enable tracing of ocfs2 on the node we suspected would reboot
      # debugfs.ocfs2 -l SUPER allow
      # debugfs.ocfs2 -l HEARTBEAT ENTRY EXIT allow
and then waited for the reboot to occur. A sample of log messages around the
time of the reboot is included below. There are no strange ocfs2 messages in
the /var/log/messages log file but I thought I would just check with your
forum if you see anything strange.

Can you confirm that ocfs2 version 1.2.9-1 is compatible with the Linux
kernel : 2.6.9-42.0.2.ELs thanks. Also if ocfs2 fences a node can you
confirm that a message is written to the /var/log/messages logfile noting
that such fencing has occurred. Your responses may help us narrow down the
cause
Can you let us know if there are any particular logfiles we should check, or
if there is anything we can do to confirm that ocfs2 is, or is not, the
cause of these reboots.

Appreciate any responses

regards
Derek Hazell  |  System Administrator
#####################################################################
APPENDIX 1 : REBOOT on Friday night (ocfs2 tracing running)
Aug 15 21:00:52 Sysname  kernel: (6885,0):dlm_mle_release:535 ENTRY:
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres:182
ENTRY:M000000000000000c5b1914dc72d356
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres_full:148
ENTRY:M000000000000000c5b1914dc72d356
Aug 15 21:00:52 Sysname  kernel: (6885,0):dlm_mle_release:535 ENTRY:
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres:182
ENTRY:M0000000000000009f1bbc95e1dad74
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres_full:148
ENTRY:M0000000000000009f1bbc95e1dad74
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres:182
ENTRY:M0000000000000009f1bbc95e1dad74
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres_full:148
ENTRY:M0000000000000009f1bbc95e1dad74
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres:182
ENTRY:M0000000000000009f1bbc95e1dad74
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres_full:148
ENTRY:M0000000000000009f1bbc95e1dad74
Aug 15 21:00:52 Sysname  kernel: (6885,0):dlm_mle_release:535 ENTRY:
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres:182
ENTRY:M000000000000000c5bc95ddc72d357
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres_full:148
ENTRY:M000000000000000c5bc95ddc72d357
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres:182
ENTRY:M000000000000000c5bc95ddc72d357
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres_full:148
ENTRY:M000000000000000c5bc95ddc72d357
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres:182
ENTRY:M000000000000000c5bc95ddc72d357
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres_full:148
ENTRY:M000000000000000c5bc95ddc72d357
Aug 15 21:00:52 Sysname  kernel: (6885,0):dlm_mle_release:535 ENTRY:
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres:182
ENTRY:M00000000000000049c73bf5e1d8e29
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres_full:148
ENTRY:M00000000000000049c73bf5e1d8e29
Aug 15 21:00:52 Sysname  kernel: (6885,0):__dlm_lookup_lockres:182
ENTRY:M00000000000000049c73bf5e1d8e29
[UNEXPECTED REBOOT]
Aug 15 21:05:09 Sysname  syslogd 1.4.1: restart.
Aug 15 21:05:09 Sysname  syslog: syslogd startup succeeded
Aug 15 21:05:09 Sysname  kernel: klogd 1.4.1, log source = /proc/kmsg
started.
Aug 15 21:05:09 Sysname  kernel: Bootdata ok (command line is ro
root=/dev/VolGroup_ID_12182/LogVol1 rhgb quiet)
Aug 15 21:05:09 Sysname  kernel: Linux version 2.6.9-42.0.2.ELsmp (
bhcompile at ls20-bc1-13.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat
3.4.6-3)) #1
 SMP Thu Aug 17 17:57:31 EDT 2006
Aug 15 21:05:09 Sysname  kernel: BIOS-provided physical RAM map:
######################################################################
APPENDIX 2 : REBOOT on Saturday night (ocfs2 tracing NOT running)
Aug 15 21:08:12 Sysname  kernel: o2net: connected to node Othersystem2.x.y
(num 1) at 172.16.172.172:7777
Aug 15 21:08:13 Sysname  kernel: o2net: accepted connection from node
Othersystem1.x.y (num 3) at 172.16.172.171:7777
Aug 15 21:08:16 Sysname  kernel: OCFS2 1.2.9 Mon May 19 13:00:33 PDT 2008
(build a693806cb619dd7f225004092b675ede)
Aug 15 21:08:16 Sysname  kernel: ocfs2_dlm: Nodes in domain
("46C5D4A751514E55B04786DFEC7B2175"): 1 2 3
Aug 15 21:08:17 Sysname  kernel: kjournald starting.  Commit interval 5
seconds
Aug 15 21:08:17 Sysname  kernel: ocfs2: Mounting device (120,1) on (node 2,
slot 2)
Aug 15 21:08:21 Sysname  kernel: ocfs2_dlm: Nodes in domain
("0D29B3C9792B46E1BD0DFF0A97E03534"): 1 2 3
Aug 15 21:08:21 Sysname  kernel: kjournald starting.  Commit interval 5
seconds
Aug 15 21:08:21 Sysname  kernel: ocfs2: Mounting device (120,17) on (node 2,
slot 2)
Aug 15 21:08:31 Sysname  ntpd[7076]: synchronized to 172.16.32.254, stratum
2
Aug 15 21:08:31 Sysname  ntpd[7076]: kernel time sync disabled 0041
Aug 15 21:08:38 Sysname  su(pam_unix)[9656]: session opened for user digicol
by root(uid=0)
Aug 15 21:08:41 Sysname  su(pam_unix)[9656]: session closed for user digicol
Aug 15 21:13:52 Sysname  ntpd[7076]: kernel time sync enabled 0001
Aug 15 21:41:46 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:41:46 Sysname  kernel: end_request: I/O error, dev sdc, sector
1291272320
Aug 15 21:41:46 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:41:46 Sysname  kernel: end_request: I/O error, dev sdc, sector
1487646848
Aug 15 21:41:47 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:41:47 Sysname  kernel: end_request: I/O error, dev sdc, sector
1301852288
Aug 15 21:41:48 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:41:48 Sysname  kernel: end_request: I/O error, dev sdc, sector
1498484864
Aug 15 21:45:09 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:45:09 Sysname  kernel: end_request: I/O error, dev sdc, sector
1611251840
Aug 15 21:45:09 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:45:09 Sysname  kernel: end_request: I/O error, dev sdc, sector
1045610624
Aug 15 21:45:09 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:45:09 Sysname  kernel: end_request: I/O error, dev sdc, sector
1234243712
Aug 15 21:45:09 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:45:09 Sysname  kernel: end_request: I/O error, dev sdc, sector
989614208
Aug 15 21:45:09 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:45:09 Sysname  kernel: end_request: I/O error, dev sdc, sector
1115283584
Aug 15 21:45:09 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:45:09 Sysname  kernel: end_request: I/O error, dev sdc, sector
1240952960
Aug 15 21:45:14 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:45:14 Sysname  kernel: end_request: I/O error, dev sdc, sector
995807360
Aug 15 21:45:14 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:45:14 Sysname  kernel: end_request: I/O error, dev sdc, sector
1104961664
Aug 15 21:45:14 Sysname  kernel: SCSI error : <1 0 2 1> return code =
0x20000
Aug 15 21:45:14 Sysname  kernel: end_request: I/O error, dev sdc, sector
1008507952
Aug 16 03:00:26 Sysname  Server Administrator: Storage Service EventID:
2242  The Patrol Read has started.:  Controller 0 (PERC 5/i Integrated)
Aug 16 03:00:27 Sysname  snmpd[7589]: Got trap from peer on fd 13
Aug 16 03:52:02 Sysname  Server Administrator: Storage Service EventID:
2243  The Patrol Read has stopped.:  Controller 0 (PERC 5/i Integrated)
Aug 16 03:52:02 Sysname  snmpd[7589]: Got trap from peer on fd 13
Aug 16 16:38:33 Sysname  sshd(pam_unix)[31901]: session opened for user root
by root(uid=0)
Aug 16 16:55:55 Sysname  sshd(pam_unix)[32254]: session opened for user root
by root(uid=0)
Aug 16 17:27:06 Sysname  sshd(pam_unix)[966]: session opened for user root
by root(uid=0)
[UNEXPECTED REBOOT]
Aug 16 23:18:31 Sysname  syslogd 1.4.1: restart.
Aug 16 23:18:31 Sysname  syslog: syslogd startup succeeded
Aug 16 23:18:31 Sysname  kernel: klogd 1.4.1, log source = /proc/kmsg
started.
Aug 16 23:18:31 Sysname  kernel: Bootdata ok (command line is ro
root=/dev/VolGroup_ID_12182/LogVol1 rhgb quiet)
Aug 16 23:18:31 Sysname  kernel: Linux version 2.6.9-42.0.2.ELsmp (
bhcompile at ls20-bc1-13.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat
3.4.6-3)) #1
 SMP Thu Aug 17 17:57:31 EDT 2006
Aug 16 23:18:31 Sysname  kernel: BIOS-provided physical RAM map:
#####################################################################
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080818/4447aa96/attachment.html 


More information about the Ocfs2-users mailing list