[Ocfs2-users] Fence abnormal and with not apparent reason
Gabriele Di Giambelardini
gabriele_d_g at yahoo.it
Fri Jul 11 02:11:45 PDT 2008
Hi to all, watching the log by more attention and in the moment when a node go down, I have this imformation by the kernel about o2net :
Jul 10 16:52:02 be1 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [o2net:6814]
Jul 10 16:52:02 be1 kernel: CPU 0:
Jul 10 16:52:02 be1 kernel: Modules linked in: ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm
parport shpchp ide_cd cdrom i2c_i801 i5000_edac i2c_core serio_raw edac_mc bnx2
Jul 10 16:52:02 be1 kernel: Pid: 6814, comm: o2net Tainted: G 2.6.18-92.el5
Jul 10 16:52:02 be1 kernel: RIP: 0010:[<ffffffff80064b57>] [<ffffffff80064b57>]
Jul 10 16:52:02 be1 kernel: RSP: 0018:ffff81043f281d28 EFLAGS: 00000246
Jul 10 16:52:02 be1 kernel: RAX: ffff810316b02828 RBX: ffff810440656018 RCX: 000
Jul 10 16:52:02 be1 kernel: RDX: 0000000000000001 RSI: 0000000000000286 RDI: fff
Jul 10 16:52:02 be1 kernel: RBP: ffff810367456c20 R08: ffff810316b02838 R09: fff
Jul 10 16:52:02 be1 kernel: R10: ffff810316b02858 R11: 000000000000fa55 R12: fff
Jul 10 16:52:02 be1 kernel: R13: 0000000000000044 R14: 000000000000001f R15: 000
Jul 10 16:52:02 be1 kernel: FS: 0000000000000000(0000) GS:ffffffff8039e000(0000
Jul 10 16:52:02 be1 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Jul 10 16:52:02 be1 kernel: CR2: 000000001c1b6ec8 CR3: 0000000449592000 CR4: 000
Jul 10 16:52:02 be1 kernel:
Jul 10 16:52:02 be1 kernel: Call Trace:
Jul 10 16:52:02 be1 kernel: [<ffffffff884e7b0b>] :ocfs2_dlm:dlm_assert_master_h
Jul 10 16:52:02 be1 kernel: [<ffffffff884ab15e>] :ocfs2_nodemanager:o2net_proce
Jul 10 16:52:02 be1 kernel: [<ffffffff884ace20>] :ocfs2_nodemanager:o2net_rx_un
Jul 10 16:52:02 be1 kernel: [<ffffffff884ac5d2>] :ocfs2_nodemanager:o2net_rx_un
Jul 10 16:52:02 be1 kernel: [<ffffffff8004cea9>] run_workqueue+0x94/0xe4
Jul 10 16:52:02 be1 kernel: [<ffffffff800497be>] worker_thread+0x0/0x122
Jul 10 16:52:02 be1 kernel: [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc
Jul 10 16:52:02 be1 kernel: [<ffffffff800498ae>] worker_thread+0xf0/0x122
Jul 10 16:52:02 be1 kernel: [<ffffffff8008ac03>] default_wake_function+0x0/0xe
Jul 10 16:52:02 be1 kernel: [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc
Jul 10 16:52:02 be1 kernel: [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc
Jul 10 16:52:02 be1 kernel: [<ffffffff8003253d>] kthread+0xfe/0x132
Jul 10 16:52:02 be1 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
Jul 10 16:52:03 be1 kernel: [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc
Jul 10 16:52:03 be1 kernel: [<ffffffff8002881b>] sync_page+0x0/0x42
Jul 10 16:52:03 be1 kernel: [<ffffffff8003243f>] kthread+0x0/0x132
Jul 10 16:52:03 be1 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
---------------------------------------------------------------------------------
Some body can help me to know what means??
Thanks
----- Messaggio originale -----
Da: Gabriele Di Giambelardini <gabriele_d_g at yahoo.it>
A: V Srinivas <vaungasrinu at gmail.com>
Cc: ocfs2-users at oss.oracle.com
Inviato: Lunedì 30 giugno 2008, 15:56:35
Oggetto: Re: [Ocfs2-users] Fence abnormal and with not apparent reason
Hi, this is my output on all the 5 servers
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold: 61
Network idle timeout: 60000
Network keepalive delay: 2000
Network reconnect delay: 2000
Checking O2CB heartbeat: Active
thanks
----- Messaggio originale -----
Da: V Srinivas <vaungasrinu at gmail.com>
A: Gabriele Di Giambelardini <gabriele_d_g at yahoo.it>
Inviato: Lunedì 30 giugno 2008, 13:07:31
Oggetto: Re: [Ocfs2-users] Fence abnormal and with not apparent reason
pls send me service o2cb status output for that servers.
On 30/06/2008, Gabriele Di Giambelardini <gabriele_d_g at yahoo.it> wrote:
I to all, I have a big and intrigued problem.
I explain you the situation:
I
have 5 servers linux and 1 SAN IBM , every server have ocfs2 and
by ocfs2-console I can watch they. Fot connect the server I use an
dedicate network,
The problem is that some times I have this message on one of the server:
kernel: o2net: connection to node test.test.it (num 1) at 10.10.10.1:7777 has been idle for 60.0 seconds, shutting it down.
So
my server has fenced, but when it come up, not success to start ocfs2
or mount partition. For resolve it I must fence all servers and
every thing restart to work well.
I have noticed the if I'm not fast to fence all servers, other nodes go in "shutting it down".
Some body can help me, it's really important for me.
my server:
- Red Hat Enterprise Linux Server release 5
2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
- ocfs2-2.6.18-8.el5-1.2.8-2.el5
ocfs2-tools-1.2.7-1.el5
ocfs2console-1.2.7-1.el5
ocfs2-tools-debuginfo-1.2.6-1.el5
ocfs2-2.6.18-92.1.1.el5-1.2.9-1.el5
- OCFS2 1.2.8 Tue Jan 22 11:58:16 PST 2008 (build 9c7ae8bb50ef6d8791df2912775adcc5)
thank in advance for any suggestions
________________________________
Scopri il Blog di Yahoo! Mail: trucchi, novità, consigli... e la tua opinione!
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users
________________________________
Scopri il Blog di Yahoo! Mail: trucchi, novità, consigli... e la tua opinione!
Posta, news, sport, oroscopo: tutto in una sola pagina.
Crea l'home page che piace a te!
www.yahoo.it/latuapagina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080711/d58b3eae/attachment-0001.html
More information about the Ocfs2-users
mailing list