[Ocfs2-users] Ooops in OCFS2

Nuno Fernandes npf-mlists at eurotux.com
Fri Jul 4 08:45:09 PDT 2008


Hi,

I have a cluster with 4 nodes all  of them with the same kernel:

Linux app19 2.6.9-48.ELxenU #1 SMP Sun Mar 4 19:50:03 EST 2007 x86_64 x86_64 
x86_64 GNU/Linux

and with

OCFS2 Node Manager 1.2.5 Tue Apr 10 12:29:33 EDT 2007 (build 
9e5f332181e8ebfad464946bcc4888af)
OCFS2 DLM 1.2.5 Tue Apr 10 12:29:33 EDT 2007 (build 
e2556a71429f31033b275dff4b5594aa)
OCFS2 DLMFS 1.2.5 Tue Apr 10 12:29:33 EDT 2007 (build 
e2556a71429f31033b275dff4b5594aa)
OCFS2 User DLM kernel interface loaded

From a moment to the other the ocfs2 filesystems freeze:

/home/user
/usr

I've rebooted one node (the one who had the higher load) and it keept on 
rebooting over and over again with the following error:

(1768,0):dlm_convert_lock_handler:443 ERROR: Domain 
CACE9ABE4D474B04A3C06C944B7D616D not fully joined!
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at dlmconvert:443
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: ocfs2(U) debugfs(U) ocfs2_dlmfs(U) ocfs2_dlm(U) 
ocfs2_nodemanager(U) configfs(U) sunrpc dm_mod xennet ext3 jbd xenblk
Pid: 1768, comm: o2net Not tainted 2.6.9-48.ELxenU
RIP: e030:[<ffffffffa00dcb8b>] 
<ffffffffa00dcb8b>{:ocfs2_dlm:dlm_convert_lock_handler+376}
RSP: e02b:ffffff807d419d88  EFLAGS: 00010292
RAX: 000000000000006a RBX: ffffff807e6bdf00 RCX: 00000000000013ba
RDX: 00000000000013ba RSI: 0000000000000000 RDI: ffffffff8032b9a0
RBP: ffffff8009669400 R08: 00000000000927bf R09: ffffff807e6bdf00
R10: ffffffff801eb0a8 R11: 0000ffff80346560 R12: ffffff807ed48000
R13: ffffff807e6bdf00 R14: 0000000000000000 R15: ffffff807ed48018
FS:  0000002a95563da0(0000) GS:ffffffff8041d700(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process o2net (pid: 1768, threadinfo ffffff807d418000, task ffffff807e562030)
Stack: ffffffffff5fd000 0000000000000000 0000000000000000 ffffff8009786c00
       0000000000000000 ffffff807e6bdf00 ffffff8009669400 ffffff807ed48000
       ffffff807e6bdf00 0000000000000000
Call Trace:<ffffffffa009dac6>{:ocfs2_nodemanager:o2net_process_message+1567}
       <ffffffffa009dd03>{:ocfs2_nodemanager:o2net_rx_until_empty+0}
       <ffffffffa009e5b6>{:ocfs2_nodemanager:o2net_rx_until_empty+2227}
       <ffffffff8014092e>{worker_thread+419} 
<ffffffff8012b177>{default_wake_function+0}
       <ffffffff8012b1c8>{__wake_up_common+67} 
<ffffffff8012b177>{default_wake_function+0}
       <ffffffff80144bd4>{keventd_create_kthread+0} 
<ffffffff8014078b>{worker_thread+0}
       <ffffffff80144bd4>{keventd_create_kthread+0} 
<ffffffff80144bab>{kthread+200}
       <ffffffff8010e092>{child_rip+8} 
<ffffffff80144bd4>{keventd_create_kthread+0}
       <ffffffff80144ae3>{kthread+0} <ffffffff8010e08a>{child_rip+0}


Code: 0f 0b 12 06 0f a0 ff ff ff ff bb 01 41 80 7f 0f 20 76 5c 48
RIP <ffffffffa00dcb8b>{:ocfs2_dlm:dlm_convert_lock_handler+376} RSP 
<ffffff807d419d88>
 <0>Kernel panic - not syncing: Oops
 Connection to xen3 closed.


I had to shutdown all 4 nodes and start them one by one. I even checked with 
fsck.ocfs2 and it didn't reported any error.

Any clues?
Thanks
Nuno Fernandes



More information about the Ocfs2-users mailing list