[Ocfs2-users] ocfs2 issue? : unexplained reboots of RHEL 4 server (kernel:2.6.9-42.0.2.ELs)

Derek Hazell derek.hazell at gmail.com
Fri Aug 22 23:24:20 PDT 2008


Hi Ocfs2 user
We got some relevant log messages (via a serial console) and via a putty
session logged on a root.
I suspect we need to set up a private network between the ocfs2 cluster
members, is this right? Anything else we might need to do?

regards, I appreciate your help

Derek
########################################################
CURRENT O2CB CONFIG
 [root at sysname fs]# /etc/init.d/o2cb configure
Configuring the O2CB driver.
This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot.  The current values will be shown in brackets ('[]').  Hitting
<ENTER> without typing an answer will keep that current value.  Ctrl-C
will abort.
Load O2CB driver on boot (y/n) [y]:
Cluster to start on boot (Enter "none" to clear) [ocfs2]:
Specify heartbeat dead threshold (>=7) [61]:
Specify network idle timeout in ms (>=5000) [60000]: 120000
Specify network keepalive delay in ms (>=1000) [2000]:
Specify network reconnect delay in ms (>=2000) [2000]:
Writing O2CB configuration: OK
O2CB cluster ocfs2 already online
[root at sysname fs]#
##################
TRACE OF ROOT PUTTY LOGIN

[root at sysname ~]#
Message from syslogd at sysname at Fri Aug 22 23:12:03 2008 ...
sysname kernel: Heartbeat thread (11) printing last 24 blocking operations
(cur = 8):

Message from syslogd at sysname at Fri Aug 22 23:12:03 2008 ...
sysname kernel: Heartbeat thread stuck at waiting for read completion,
stuffing current time into that blocker (index 8)

Message from syslogd at sysname at Fri Aug 22 23:12:03 2008 ...
sysname kernel: Index 9: took 0 ms to do bio alloc read

.
.
.

Message from syslogd at sysname at Fri Aug 22 23:12:04 2008 ...
sysname kernel: Index 3: took 5240 ms to do waiting for write completion

Message from syslogd at sysname at Fri Aug 22 23:12:04 2008 ...
sysname kernel: Index 4: took 0 ms to do allocating bios for read

Message from syslogd at sysname at Fri Aug 22 23:12:04 2008 ...
sysname kernel: Index 5: took 0 ms to do bio alloc read

Message from syslogd at sysname at Fri Aug 22 23:12:04 2008 ...
sysname kernel: Index 6: took 0 ms to do bio add page read

Message from syslogd at sysname at Fri Aug 22 23:12:04 2008 ...
sysname kernel: Index 7: took 0 ms to do submit_bio for read

Message from syslogd at sysname at Fri Aug 22 23:12:04 2008 ...
sysname kernel: Index 8: took 120303 ms to do waiting for read completion

#############
TRACE OF SERIAL CONSOLE:
(11,1):o2hb_write_timeout:269 ERROR: Heartbeat write timeout to device
emcpowerb1 after 120000 milliseconds
Heartbeat thread (11) printing last 24 blocking operations (cur = 8):
Heartbeat thread stuck at waiting for read completion, stuffing current time
into that blocker (index 8)
Index 9: took 0 ms to do bio alloc read
Index 10: took 0 ms to do bio add page read
Index 11: took 0 ms to do submit_bio for read
Index 12: took 3025 ms to do waiting for read completion
Index 13: took 0 ms to do bio alloc write
Index 14: took 0 ms to do bio add page write
Index 15: took 0 ms to do submit_bio for write
Index 16: took 0 ms to do checking slots
Index 17: took 7221 ms to do waiting for write completion
Index 18: took 0 ms to do allocating bios for read
Index 19: took 0 ms to do bio alloc read
Index 20: took 0 ms to do bio add page read
Index 21: took 0 ms to do submit_bio for read
Index 22: took 3892 ms to do waiting for read completion
Index 23: took 0 ms to do bio alloc write
Index 0: took 0 ms to do bio add page write
Index 1: took 0 ms to do submit_bio for write
Index 2: took 0 ms to do checking slots
Index 3: took 5240 ms to do waiting for write completion
Index 4: took 0 ms to do allocating bios for read
Index 5: took 0 ms to do bio alloc read
Index 6: took 0 ms to do bio add page read
Index 7: took 0 ms to do submit_bio for read
Index 8: took 120303 ms to do waiting for read completion
*** ocfs2 is very sorry to be fencing this system by restarting ***
Bootdata ok (command line is ro root=/dev/VolGroup_ID_12182/LogVol1
console=ttyS0,9600n8)


################################################################################
-----Original Message-----
From: ocfs2-users-bounces at oss.oracle.com [mailto:
ocfs2-users-bounces at oss.oracle.com] On Behalf Of Sunil Mushran
Sent: Tuesday, 19 August 2008 3:56 AM
To: _Derek Hazell (Internet)
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] ocfs2 issue? : unexplained reboots of RHEL 4
server (kernel:2.6.9-42.0.2.ELs)


Configure a netdump or netconsole server. It will catch the relevant

messages.

################################################################################
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080823/780d479d/attachment.html 


More information about the Ocfs2-users mailing list