[Ocfs2-users] Kernel crash from OCFS dlmmaster.c

Nick nick at agentpoint.com
Wed Apr 3 20:28:24 PDT 2013


Hi

I just experienced a bizarre crash with OCFS2 which appears to be a 
kernel bug.
I am using a primary primary setup with DRBD and OCFS2 on Ubuntu.

There was a momentary network blip which caused DRBD to disconnect.
It tried reconnecting but it split brained and the node which crashed 
went in to StandAlone mode.
56 seconds after DRBD had disconnected, syslog shows this:

Apr  4 13:53:14 sau-efd65-or kernel: [3114321.109782] o2net: No longer 
connected to node weba (num 0) at 27.50.xx.xx:7777
Apr  4 13:53:14 sau-efd65-or kernel: [3114321.109818] o2cb: o2dlm has 
evicted node 0 from domain ECD4A41D57364B27A3C3244BBB83FA33
Apr  4 13:53:14 sau-efd65-or kernel: [3114321.109868] 
(o2hb-ECD4A41D57,4163,5):__dlm_put_mle:239 ERROR: bad mle: ffff8803c130a300
Apr  4 13:53:14 sau-efd65-or kernel: [3114321.125272] ------------[ cut 
here ]------------
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.133095] kernel BUG at 
/build/buildd/linux-3.5.0/fs/ocfs2/dlm/dlmmaster.c:241!
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.148253] invalid opcode: 
0000 [#1] SMP
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.156133] CPU 5
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.156255] Modules linked in: 
ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2 ocfs2_nodemanager configfs 
ocfs2_stackglue quota_tree nfnetlink_log nfnetlink drbd lru_cache btrfs 
zlib_deflate libcrc$
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.214562]
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.222865] Pid: 4163, comm: 
o2hb-ECD4A41D57 Not tainted 3.5.0-17-generic #28-Ubuntu Supermicro 
X9SCI/X9SCA/X9SCI/X9SCA
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.241534] RIP: 
0010:[<ffffffffa0640409>]  [<ffffffffa0640409>] __dlm_put_mle+0x89/0xe0 
[ocfs2_dlm]
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.260875] RSP: 
0018:ffff8803c3c0b9a0  EFLAGS: 00010297
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.270500] RAX: 
000000000000009a RBX: ffff8803c130a300 RCX: ffff88042fd56e18
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.289984] RDX: 
ffffea000dbf8560 RSI: 0000000000000003 RDI: 0000000000000297
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.310641] RBP: 
ffff8803c3c0b9c0 R08: 0000000000000297 R09: ffff88042fbf97a8
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.332065] R10: 
0000000000000024 R11: 0000000000000000 R12: ffff8803c998a400
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.354254] R13: 
0000000000000bae R14: 00000000000000ff R15: 0000000000000000
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.376528] FS: 
0000000000000000(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.398704] CS:  0010 DS: 0000 
ES: 0000 CR0: 000000008005003b
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.409991] CR2: 
000000000061bd84 CR3: 0000000001c0b000 CR4: 00000000001407e0
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.433530] DR0: 
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr  4 13:53:15 sau-efd65-or kernel: [3114321.456424] DR3: 00Apr  4 
22:01:03 localhost kernel: imklog 5.8.6, log source = /proc/kmsg started.

That clearly isn't the normal expected behaviour.
Should I submit full logs as a bug report?

Thanks
-- 
Nick Stallman
Agentpoint Pty Ltd
The Real Estate Web Developers
Sydney, Australia
nick at agentpoint.com
www.agentpoint.com.au | www.zooproperty.com | www.ginga.com.au | 
www.business2.com.au

Business2.com.au is a real estate agent information website that helps 
you understand Portals, Technology and comes with FREE tools to help 
your Agency become an online success!



More information about the Ocfs2-users mailing list