[Ocfs2-users] o2net: connect to node has been idle for 10 secs

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Mon Aug 7 10:55:02 PDT 2006


Ooo, it is well known (for OCFSv2) message.

Moreover, THIS particular timeout can not be changed.

In my case, after spending few days, I find that my HugeTLB setting (in
Oracle) caused long kernel loop and it forced OCFSv2 to reboot because of
losing connection.

PS. I dream, when I will see a SET of heartbeat interfaces in OCFSv2. It is
THE ONLY system which do not support it (from all clustered systems I have
around). Bonding.. hmm, bonding is for another purposes, and have 20 - 30
seconds reconvergence time by design.

----- Original Message ----- 
From: "Andy Phillips" <Andrew.Phillips at betfair.com>
To: "Sunil Mushran" <Sunil.Mushran at oracle.com>
Cc: "ocfs2-users" <ocfs2-users at oss.oracle.com>
Sent: Monday, August 07, 2006 2:20 AM
Subject: Re: [Ocfs2-users] o2net: connect to node has been idle for 10 secs


> Hello,
>
>      Well we had the same problem again;
>
> o2net: connection to node barney (num 0) at 172.16.6.10:7777
> has been idle for 10 seconds, shutting it down.
>
> kernel: (0,0):o2net_idle_timer:1309 here are some times that might help
> debug the situation: (tmr 1154932284.14757 now 1154932294.13147 dr
> 1154932284.14717 adv 1154932284.14767:1154932284.14768 func (06aac8a1:1)
> 1154932279.15062:1154932279.15068)
>
>     We upgraded to 1.2.3. And it almost immediately died again with the
> same error. Our cron job that touches a file every 3 seconds did not
> seem to make much difference. This is now quite a serious problem for
> us.
>
>     Any suggestions as to how to take this forward?
>
>     Sunil, what do you need from us to roll a custom debugging build?
> Can we run the custom build on node 2 and leave the existing build on
> node 1, which is now production?
>
>     Andy
>
>
> > > >> Aug  2 19:06:27 fred kernel: o2net: connection to node barney (num
0) at
> > > >> 172.16.6.10:7777 has been idle for 10 seconds, shutting it down.
> > > >> Aug  2 19:06:27 fred kernel: (0,7):o2net_idle_timer:1309 here are
some
> > > >> times that might help debug the situation: (tmr 1154545576.798263
now
> > > >> 1154545586.796978 dr 1154545576.798238 adv
> > > >> 1154545576.798291:1154545576.798293 func (06aac8a1:1)
> > > >> 1154545566.800782:1154545566.800787)
> > > >> Aug  2 19:06:27 fred kernel: o2net: no longer connected to node
barney
> > > >> (num 0) at 172.16.6.10:7777
> > > >> Aug  2 19:08:33 fred kernel: (25,7):o2quo_make_decision:143 ERROR:
> > > >> fencing this node because it is connected to
> > > >> a half-quorum of 1 out of 2 nodes which doesn't include the lowest
> > > >> active node 0
> > > >> Aug  2 19:08:33 fred kernel: (25,7):o2hb_stop_all_regions:1908
ERROR:
> > > >> stopping heartbeat on all active regions.
> >
________________________________________________________________________
> -- 
> Andy Phillips, FRAS
> Systems Architect, Information Systems.
>
> Direct Line: 0208 834 8436
>
> The information in this e-mail and any attachment is confidential and is
> intended only for the named recipient(s). The e-mail may not be
> disclosed or used by any person other than the addressee, nor may it be
> copied in any way. If you are not a named recipient please notify the
> sender immediately and delete any copies of this message. Any
> unauthorized copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.Any view or opinions presented are solely
> those of the author and do not necessarily represent those of
> Betfair.Betfair is the trading name of The Sporting Exchange Limited
> whose registered office is: Waterfront, Hammersmith Embankment,
> Chancellors Road, London W6 9HP. Registered in England with No. 3770548.
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>




More information about the Ocfs2-users mailing list