[Ocfs2-users] ocfs2 issue? : unexplained reboots of RHEL 4 server (kernel:2.6.9-42.0.2.ELs)

Sunil Mushran sunil.mushran at oracle.com
Sat Aug 23 09:41:03 PDT 2008


Which io scheduler are you using? On el4, it is best to use deadline.
cfq is the default. Check the faq for details on using deadline.

Derek Hazell wrote:
>
> Hi Ocfs2 user
> We got some relevant log messages (via a serial console) and via a 
> putty session logged on a root.
> I suspect we need to set up a private network between the ocfs2 
> cluster members, is this right? Anything else we might need to do?
>  
> regards, I appreciate your help
>
> Derek
> ########################################################
> CURRENT O2CB CONFIG
>  [root at sysname fs]# /etc/init.d/o2cb configure
> Configuring the O2CB driver.
> This will configure the on-boot properties of the O2CB driver.
> The following questions will determine whether the driver is loaded on
> boot.  The current values will be shown in brackets ('[]').  Hitting
> <ENTER> without typing an answer will keep that current value.  Ctrl-C
> will abort.
> Load O2CB driver on boot (y/n) [y]:
> Cluster to start on boot (Enter "none" to clear) [ocfs2]:
> Specify heartbeat dead threshold (>=7) [61]:
> Specify network idle timeout in ms (>=5000) [60000]: 120000
> Specify network keepalive delay in ms (>=1000) [2000]:
> Specify network reconnect delay in ms (>=2000) [2000]:
> Writing O2CB configuration: OK
> O2CB cluster ocfs2 already online
> [root at sysname fs]#
> ##################
> TRACE OF ROOT PUTTY LOGIN
>
> [root at sysname ~]#
> Message from syslogd at sysname <mailto:syslogd at sysname> at Fri Aug 22 
> 23:12:03 2008 ...
> sysname kernel: Heartbeat thread (11) printing last 24 blocking 
> operations (cur = 8):
>
> Message from syslogd at sysname <mailto:syslogd at sysname> at Fri Aug 22 
> 23:12:03 2008 ...
> sysname kernel: Heartbeat thread stuck at waiting for read completion, 
> stuffing current time into that blocker (index 8)
>
> Message from syslogd at sysname <mailto:syslogd at sysname> at Fri Aug 22 
> 23:12:03 2008 ...
> sysname kernel: Index 9: took 0 ms to do bio alloc read
>
> .
> .
> .
>
> Message from syslogd at sysname <mailto:syslogd at sysname> at Fri Aug 22 
> 23:12:04 2008 ...
> sysname kernel: Index 3: took 5240 ms to do waiting for write completion
>
> Message from syslogd at sysname <mailto:syslogd at sysname> at Fri Aug 22 
> 23:12:04 2008 ...
> sysname kernel: Index 4: took 0 ms to do allocating bios for read
>
> Message from syslogd at sysname <mailto:syslogd at sysname> at Fri Aug 22 
> 23:12:04 2008 ...
> sysname kernel: Index 5: took 0 ms to do bio alloc read
>
> Message from syslogd at sysname <mailto:syslogd at sysname> at Fri Aug 22 
> 23:12:04 2008 ...
> sysname kernel: Index 6: took 0 ms to do bio add page read
>
> Message from syslogd at sysname <mailto:syslogd at sysname> at Fri Aug 22 
> 23:12:04 2008 ...
> sysname kernel: Index 7: took 0 ms to do submit_bio for read
>
> Message from syslogd at sysname <mailto:syslogd at sysname> at Fri Aug 22 
> 23:12:04 2008 ...
> sysname kernel: Index 8: took 120303 ms to do waiting for read completion
>  
> #############
> TRACE OF SERIAL CONSOLE:
> (11,1):o2hb_write_timeout:269 ERROR: Heartbeat write timeout to device 
> emcpowerb1 after 120000 milliseconds
> Heartbeat thread (11) printing last 24 blocking operations (cur = 8):
> Heartbeat thread stuck at waiting for read completion, stuffing 
> current time into that blocker (index 8)
> Index 9: took 0 ms to do bio alloc read
> Index 10: took 0 ms to do bio add page read
> Index 11: took 0 ms to do submit_bio for read
> Index 12: took 3025 ms to do waiting for read completion
> Index 13: took 0 ms to do bio alloc write
> Index 14: took 0 ms to do bio add page write
> Index 15: took 0 ms to do submit_bio for write
> Index 16: took 0 ms to do checking slots
> Index 17: took 7221 ms to do waiting for write completion
> Index 18: took 0 ms to do allocating bios for read
> Index 19: took 0 ms to do bio alloc read
> Index 20: took 0 ms to do bio add page read
> Index 21: took 0 ms to do submit_bio for read
> Index 22: took 3892 ms to do waiting for read completion
> Index 23: took 0 ms to do bio alloc write
> Index 0: took 0 ms to do bio add page write
> Index 1: took 0 ms to do submit_bio for write
> Index 2: took 0 ms to do checking slots
> Index 3: took 5240 ms to do waiting for write completion
> Index 4: took 0 ms to do allocating bios for read
> Index 5: took 0 ms to do bio alloc read
> Index 6: took 0 ms to do bio add page read
> Index 7: took 0 ms to do submit_bio for read
> Index 8: took 120303 ms to do waiting for read completion
> *** ocfs2 is very sorry to be fencing this system by restarting ***
> Bootdata ok (command line is ro root=/dev/VolGroup_ID_12182/LogVol1 
> console=ttyS0,9600n8)
>  
>  
> ################################################################################
> -----Original Message-----
> From: ocfs2-users-bounces at oss.oracle.com 
> <mailto:ocfs2-users-bounces at oss.oracle.com> 
> [mailto:ocfs2-users-bounces at oss.oracle.com 
> <mailto:ocfs2-users-bounces at oss.oracle.com>] On Behalf Of Sunil Mushran
> Sent: Tuesday, 19 August 2008 3:56 AM
> To: _Derek Hazell (Internet)
> Cc: ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
> Subject: Re: [Ocfs2-users] ocfs2 issue? : unexplained reboots of RHEL 
> 4 server (kernel:2.6.9-42.0.2.ELs)
>  
>
> Configure a netdump or netconsole server. It will catch the relevant
>
> messages.
>
> ################################################################################
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users




More information about the Ocfs2-users mailing list