[Ocfs2-users] kernel panic - not syncing

Andy Phillips andrew.phillips at betfair.com
Mon Jan 22 10:02:53 PST 2007


Its worth pointing out that the o2net idle timer is triggering on the 
network heartbeat, which is 10 seconds, in the current 1.2.x series.


O2CB_HEARTBEAT_THRESHOLD has no effect on this, because its another part
of the code which causes the problem.

see ocfs2-1.2.3/fs/ocfs2/cluster/tcp_internal.h
#define O2NET_IDLE_TIMEOUT_SECS         10

Andy


On Mon, 2007-01-22 at 09:29 -0800, Srinivas Eeda wrote:
> problem appears to be that IO is taking more time than effective O2CB_HEARTBEAT_THRESHOLD. Your configured value "31" doesn't seem to be effective?
> 
> Index 6: took 1995 ms to do msleepIndex 
> Index 17: took 1996 ms to do msleep
> Index 22: took 10001 ms to do waiting for read completion.
> 
> Can you please cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold and verify. 
> 
> Thanks,
> --Srini.
> 
> 
> 
> 
> Consulente3 wrote:
> > Hi all, 
> >
> > my test environment, is composed by 2 server with centos 4.4
> > nodes is exporting with aoe6-43 + vblade-14
> >
> > kernel-2.6.9-42.0.3.EL
> > ocfs2-tools-1.2.2-1
> > ocfs2console-1.2.2-1
> > ocfs2-2.6.9-42.0.3.EL-1.2.3-1
> >
> > /dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local)
> > /dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local)
> >
> > Device                FS     Nodes
> > /dev/etherd/e2.0      ocfs2  ocfs2, becks
> > /dev/etherd/e3.0      ocfs2  ocfs2, becks
> >
> > Device                FS     UUID                                  Label
> > /dev/etherd/e2.0      ocfs2  b24cc18d-af89-4980-a75e-a87530b1b878  test1
> > /dev/etherd/e3.0      ocfs2  101a92fd-b83b-4294-8bfc-fbaa069c3239  nfs4
> >
> > O2CB_HEARTBEAT_THRESHOLD=31
> >
> > when i try to make stress test:
> >
> > Index 4: took 0 ms to do checking slots
> > Index 5: took 2 ms to do waiting for write completion
> > Index 6: took 1995 ms to do msleep
> > Index 7: took 0 ms to do allocating bios for read
> > Index 8: took 0 ms to do bio alloc read
> > Index 9: took 0 ms to do bio add page read
> > Index 10: took 0 ms to do submit_bio for read
> > Index 11: took 2 ms to do waiting for read completion
> > Index 12: took 0 ms to do bio alloc write
> > Index 13: took 0 ms to do bio add page write
> > Index 14: took 0 ms to do submit_bio for write
> > Index 15: took 0 ms to do checking slots
> > Index 16: took 1 ms to do waiting for write completion
> > Index 17: took 1996 ms to do msleep
> > Index 18: took 0 ms to do allocating bios for read
> > Index 19: took 0 ms to do bio allo read
> > Index 20: took 0 ms to do bio add page read
> > Index 21: took 0 ms to do submit_bio for read
> > Index 22: took 10001 ms to do waiting for read completion
> > (3,0):o2hb_stop_all_regions:1908 ERROR: stopping heartbeat on all active
> > regions.
> > Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
> > system by panicing
> >
> >
> > <6>o2net: connection to node ocfs2 (num 2) at 10.1.7.107:777 has been
> > idle for 10 seconds, shutting it down
> > (3,0): o2net_idle_timer:1309 here are some times that might help debug
> > the situation:
> > (tmr: 1169487957.71650 now 1169487967.69569 dr 1169487962.88883 adv
> > 1169487957.71671:1159487957.71674
> > func 83bce37b2:505) 1169487901.984644:1169487901.984676)
> >
> > the kernel panic occurs always on the same node, and the other node
> > still responding
> >
> > thanks!
> >                                                                  
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >   
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
-- 
Andy Phillips
Systems Architecture Manager, Betfair.com

Office: 0208 8348436

Betfair Ltd|Winslow Road|Hammersmith Embankment|London|W69HP 
Company No. 5140986 
The information in this e-mail and any attachment is confidential and is
intended only for the named recipient(s). The e-mail may not be
disclosed or used by any person other than the addressee, nor may it be
copied in any way. If you are not a named recipient please notify the
sender immediately and delete any copies of this message. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden. Any view or opinions presented are solely
those of the author and do not necessarily represent those of the
company.






More information about the Ocfs2-users mailing list