[Ocfs2-users] Fence abnormal and with not apparent reason

Gabriele Di Giambelardini gabriele_d_g at yahoo.it
Fri Jul 11 02:11:45 PDT 2008


Hi to all, watching the log by more attention and in the moment when a node go down, I have this imformation by the kernel about o2net :

Jul 10 16:52:02 be1 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [o2net:6814] 
Jul 10 16:52:02 be1 kernel: CPU 0: 
Jul 10 16:52:02 be1 kernel: Modules linked in: ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm 
parport shpchp ide_cd cdrom i2c_i801 i5000_edac i2c_core serio_raw edac_mc bnx2 
Jul 10 16:52:02 be1 kernel: Pid: 6814, comm: o2net Tainted: G      2.6.18-92.el5 
Jul 10 16:52:02 be1 kernel: RIP: 0010:[<ffffffff80064b57>]  [<ffffffff80064b57>] 
Jul 10 16:52:02 be1 kernel: RSP: 0018:ffff81043f281d28  EFLAGS: 00000246 
Jul 10 16:52:02 be1 kernel: RAX: ffff810316b02828 RBX: ffff810440656018 RCX: 000 
Jul 10 16:52:02 be1 kernel: RDX: 0000000000000001 RSI: 0000000000000286 RDI: fff 
Jul 10 16:52:02 be1 kernel: RBP: ffff810367456c20 R08: ffff810316b02838 R09: fff 
Jul 10 16:52:02 be1 kernel: R10: ffff810316b02858 R11: 000000000000fa55 R12: fff 
Jul 10 16:52:02 be1 kernel: R13: 0000000000000044 R14: 000000000000001f R15: 000 
Jul 10 16:52:02 be1 kernel: FS:  0000000000000000(0000) GS:ffffffff8039e000(0000 
Jul 10 16:52:02 be1 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b 
Jul 10 16:52:02 be1 kernel: CR2: 000000001c1b6ec8 CR3: 0000000449592000 CR4: 000 
Jul 10 16:52:02 be1 kernel: 
Jul 10 16:52:02 be1 kernel: Call Trace: 
Jul 10 16:52:02 be1 kernel:  [<ffffffff884e7b0b>] :ocfs2_dlm:dlm_assert_master_h 
Jul 10 16:52:02 be1 kernel:  [<ffffffff884ab15e>] :ocfs2_nodemanager:o2net_proce 
Jul 10 16:52:02 be1 kernel:  [<ffffffff884ace20>] :ocfs2_nodemanager:o2net_rx_un 
Jul 10 16:52:02 be1 kernel:  [<ffffffff884ac5d2>] :ocfs2_nodemanager:o2net_rx_un 
Jul 10 16:52:02 be1 kernel:  [<ffffffff8004cea9>] run_workqueue+0x94/0xe4 
Jul 10 16:52:02 be1 kernel:  [<ffffffff800497be>] worker_thread+0x0/0x122 
Jul 10 16:52:02 be1 kernel:  [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc 
Jul 10 16:52:02 be1 kernel:  [<ffffffff800498ae>] worker_thread+0xf0/0x122 
Jul 10 16:52:02 be1 kernel:  [<ffffffff8008ac03>] default_wake_function+0x0/0xe 
Jul 10 16:52:02 be1 kernel:  [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc 
Jul 10 16:52:02 be1 kernel:  [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc 
Jul 10 16:52:02 be1 kernel:  [<ffffffff8003253d>] kthread+0xfe/0x132 
Jul 10 16:52:02 be1 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11 
Jul 10 16:52:03 be1 kernel:  [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc 
Jul 10 16:52:03 be1 kernel:  [<ffffffff8002881b>] sync_page+0x0/0x42 
Jul 10 16:52:03 be1 kernel:  [<ffffffff8003243f>] kthread+0x0/0x132 
Jul 10 16:52:03 be1 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11 

---------------------------------------------------------------------------------

Some body can help me to know what means??

Thanks



----- Messaggio originale -----
Da: Gabriele Di Giambelardini <gabriele_d_g at yahoo.it>
A: V Srinivas <vaungasrinu at gmail.com>
Cc: ocfs2-users at oss.oracle.com
Inviato: Lunedì 30 giugno 2008, 15:56:35
Oggetto: Re: [Ocfs2-users] Fence abnormal and with not apparent reason


Hi, this is my output on all the 5 servers

Module "configfs": Loaded 
Filesystem "configfs": Mounted 
Module "ocfs2_nodemanager": Loaded 
Module "ocfs2_dlm": Loaded 
Module "ocfs2_dlmfs": Loaded 
Filesystem "ocfs2_dlmfs": Mounted 
Checking O2CB cluster ocfs2: Online 
  Heartbeat dead threshold: 61 
  Network idle timeout: 60000 
  Network keepalive delay: 2000 
  Network reconnect delay: 2000 
Checking O2CB heartbeat: Active 

thanks





----- Messaggio originale -----
Da: V Srinivas <vaungasrinu at gmail.com>
A: Gabriele Di Giambelardini <gabriele_d_g at yahoo.it>
Inviato: Lunedì 30 giugno 2008, 13:07:31
Oggetto: Re: [Ocfs2-users] Fence abnormal and with not apparent reason

pls send me service o2cb status output for that servers.



On 30/06/2008, Gabriele Di Giambelardini <gabriele_d_g at yahoo.it> wrote:
I to all, I have a big and intrigued problem.
I explain you the situation:
I
have 5 servers linux and 1 SAN IBM , every server have ocfs2  and
by ocfs2-console I can watch they. Fot connect the server I use an
dedicate network,
The problem is that some times I have this message on one of the server:

kernel: o2net: connection to node test.test.it (num 1) at 10.10.10.1:7777 has been idle for 60.0 seconds, shutting it down.

So
my server has fenced, but when it come up, not success to start ocfs2
or mount partition. For resolve it I must fence all  servers and
every thing restart to work well.
I have noticed the if I'm not fast to fence all servers, other nodes go in "shutting it down".


Some body can help me, it's really important for me.

my server:

- Red Hat Enterprise Linux Server release 5
2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

- ocfs2-2.6.18-8.el5-1.2.8-2.el5
  ocfs2-tools-1.2.7-1.el5
  ocfs2console-1.2.7-1.el5
  ocfs2-tools-debuginfo-1.2.6-1.el5
  ocfs2-2.6.18-92.1.1.el5-1.2.9-1.el5

- OCFS2 1.2.8 Tue Jan 22 11:58:16 PST 2008 (build 9c7ae8bb50ef6d8791df2912775adcc5)

thank in advance for any suggestions





________________________________
Scopri il Blog di Yahoo! Mail: trucchi, novità, consigli... e la tua opinione!
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


________________________________
Scopri il Blog di Yahoo! Mail: trucchi, novità, consigli... e la tua opinione!


      Posta, news, sport, oroscopo: tutto in una sola pagina. 
Crea l&#39;home page che piace a te!
www.yahoo.it/latuapagina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080711/d58b3eae/attachment-0001.html 


More information about the Ocfs2-users mailing list