[Ocfs2-users] kernel BUG at <bad filename>:58347!

Sunil Mushran Sunil.Mushran at oracle.com
Thu Oct 19 10:05:19 PDT 2006


BTW, have you looked into using ocfs2 1.2.3. The src tarball in is on 
oss.oracle.com.
That should build with mainline. Other option is to upgrade the kernel 
to 2.6.18.
That has all the relevant patches.

Bleeding Edge wrote:
> i'm looking to move forward on this, so sorry if I seem a little 
> anxious, I'm excited to get it going correctly.
>  
> Are there patches that I can get somewhere to apply?  Or a path I can 
> explore to begin getting this kernel up to speed with the correct 
> ocfs2-dlm?
> Or possibly are the correct ocfs2-dlm files in the kernel source tree 
> for later kernels that I could upgrade to?
>  
> Thanks for the replies so far!
>
>  
> On 10/12/06, *Sunil Mushran* <Sunil.Mushran at oracle.com 
> <mailto:Sunil.Mushran at oracle.com>> wrote:
>
>     The ocfs2 shipping with that kernel is missing few dlm patches.
>     I'll put together some patches. There is a bugzilla logged on this.
>
>     Bleeding Edge wrote:
>     >
>     > I've gotten the error below several times on different builds on
>     > different hardware:
>     >
>     > The setup is a bit different from the norm, it's a xenofied 2.6.16
>     > kernel runing Debian Etch with block device backends for the ocfs2
>     > storage. (yes I know it's an adventerous setup).  I'm using the
>     ocfs2
>     > from the kernel, and the ocfs2-tools from Debian, ( 1.2.1-1).
>     >
>     > 99% of the time it's great, fences well, does it's job.  Only
>     one node
>     > is actually being "used" but both are on and mounted.  I have seen
>     > these errors when just leaving it overnight, and after a while it
>     > bombs, system load doesn't seem to be a factor.  When I logged
>     in this
>     > morning, on the node that had very little load, I found this in the
>     > console:
>     >
>     > (2016,0):dlm_proxy_ast_handler:321 ERROR: got ast for unknown
>     lockres!
>     > cookie=144115188078155225, name=M0000000000000006676050a149f878,
>     > namelen=31
>     >
>     > When I attempted to shutdown the 2nd node, it exploded with the
>     > following error, and locked up both nodes, and I'm looking for
>     > clarification, or even just a starting point:
>     >
>     > Thanks
>     >
>     >
>     >
>     >
>     > kernel BUG at <bad filename>:58347!
>     > invalid opcode: 0000 [#1]
>     > SMP
>     > Modules linked in: ocfs2 ipv6 ocfs2_dlmfs ocfs2_dlm
>     ocfs2_nodemanager
>     > configfs dm_snapshot dm_mirror dm_mod ext3 jbd
>     > CPU:    0
>     > EIP:    0061:[<d113cec5>]    Not tainted VLI
>     > EFLAGS: 00010202   (2.6.16-xen-domU #1)
>     > EIP is at __dlm_lockres_reserve_ast+0x35/0x40 [ocfs2_dlm]
>     > eax: 00000028   ebx: cd9bfa80   ecx: f578d000   edx: 00000000
>     > esi: 00000002   edi: cb695200   ebp: cd9bfa80   esp: cb079d3c
>     > ds: 007b   es: 007b   ss: 0069
>     > Process umount (pid: 8046, threadinfo=cb078000 task=cf7ac030)
>     > Stack: <0>cd9bfa80 cd9bfa8c d1140c57 cd9bfa80 cc412ea0 0000001f
>     > 00000002 00000028
>     >        00000000 c03b8203 00000400 c011e200 cb695230 cd9bfa8c
>     d114c6cc
>     > cd9bfac8
>     >        02000000 c4fe7000 c82d2e00 c4fe7000 00000000 cc412ea0
>     0000001f
>     > 00000000
>     > Call Trace:
>     >  [<d1140c57>] dlm_migrate_lockres+0x677/0x15f0 [ocfs2_dlm]
>     >  [<c011e200>] vprintk+0x290/0x330
>     >  [<c02f45d6>] schedule+0x536/0x860
>     >  [<d11347a5>] dlm_purge_lockres+0x75/0x230 [ocfs2_dlm]
>     >  [<d1131868>] dlm_unregister_domain+0x108/0x740 [ocfs2_dlm]
>     >  [<c02f49cf>] wait_for_completion+0xaf/0x110
>     >  [<c0116b70>] default_wake_function+0x0/0x20
>     >  [<d126ae8d>] ocfs2_remove_lockres_tracking+0xd/0x40 [ocfs2]
>     >  [<c0133552>] kthread_stop_sem+0x82/0xb0
>     >  [<d127038d>] ocfs2_dlm_shutdown+0xed/0x360 [ocfs2]
>     >  [<d129f295>] ocfs2_unregister_net_handlers+0x25/0xc0 [ocfs2]
>     >  [<d129a521>] ocfs2_dismount_volume+0x181/0x4c0 [ocfs2]
>     >  [<d129aa81>] ocfs2_put_super+0x31/0xe0 [ocfs2]
>     >  [<c016c3f2>] generic_shutdown_super+0x92/0x150
>     >  [<c016c4d9>] kill_block_super+0x29/0x50
>     >  [<c016c5ea>] deactivate_super+0x7a/0xa0
>     >  [<c018453b>] sys_umount+0x4b/0x2d0
>     >  [<c0105171>] syscall_call+0x7/0xb
>     > Code: 43 48 84 c0 7f 21 0f b7 43 5a a8 20 75 0b a8 20 75 19 f0 ff 43
>     > 44 59 5b c3 89 1c 24 e8 15 6a ff ff 0f b7 43 5a eb e7 0f 0b eb
>     db <0f>
>     > 0b eb e3 8d b4 26 00 00 00 00 53 8b 5c 24 0c 8d 43 48 e8 73
>     >  Badness in do_exit at kernel/exit.c:802
>     >  [<c012134d>] do_exit+0x89d/0x8b0
>     >  [<c011007b>] prepare_for_smp+0x4b/0x160
>     >  [<c0105c9a>] die+0x23a/0x240
>     >  [<c0106590>] do_invalid_op+0x0/0xc0
>     >  [<c010663f>] do_invalid_op+0xaf/0xc0
>     >  [<d113cec5>] __dlm_lockres_reserve_ast+0x35/0x40 [ocfs2_dlm]
>     >  [<d113d7a5>] dlm_init_mle+0x85/0x180 [ocfs2_dlm]
>     >  [<c0105303>] error_code+0x2b/0x30
>     >  [<d113cec5>] __dlm_lockres_reserve_ast+0x35/0x40 [ocfs2_dlm]
>     >  [<d1140c57>] dlm_migrate_lockres+0x677/0x15f0 [ocfs2_dlm]
>     >  [<c011e200>] vprintk+0x290/0x330
>     >  [<c02f45d6>] schedule+0x536/0x860
>     >  [<d11347a5>] dlm_purge_lockres+0x75/0x230 [ocfs2_dlm]
>     >  [<d1131868>] dlm_unregister_domain+0x108/0x740 [ocfs2_dlm]
>     >  [<c02f49cf>] wait_for_completion+0xaf/0x110
>     >  [<c0116b70>] default_wake_function+0x0/0x20
>     >  [<d126ae8d>] ocfs2_remove_lockres_tracking+0xd/0x40 [ocfs2]
>     >  [<c0133552>] kthread_stop_sem+0x82/0xb0
>     >  [<d127038d>] ocfs2_dlm_shutdown+0xed/0x360 [ocfs2]
>     >  [<d129f295>] ocfs2_unregister_net_handlers+0x25/0xc0 [ocfs2]
>     >  [<d129a521>] ocfs2_dismount_volume+0x181/0x4c0 [ocfs2]
>     >  [<d129aa81>] ocfs2_put_super+0x31/0xe0 [ocfs2]
>     >  [<c016c3f2>] generic_shutdown_super+0x92/0x150
>     >  [<c016c4d9>] kill_block_super+0x29/0x50
>     >  [<c016c5ea>] deactivate_super+0x7a/0xa0
>     >  [<c018453b>] sys_umount+0x4b/0x2d0
>     >  [<c0105171>] syscall_call+0x7/0xb
>     > (6766,0):o2net_idle_timer:1284 connection to node vserver1-3 (num 2)
>     > at 10.10.69.113:7777 <http://10.10.69.113:7777> <
>     http://10.10.69.113:7777> has been idle for 10
>     > seconds, shutting it down.
>     > (6766,0):o2net_idle_timer:1297 here are some times that might help
>     > debug the situation: (tmr 1160659302.852081 now 1160659312.846775 dr
>     > 1160659307.850562 adv 1160659302.852111:1160659302.852111 func
>     > (b9bad2f8:506) 1160659302.852082:1160659302.852088)
>     > BUG: soft lockup detected on CPU#0!
>     >
>     > Pid: 6766, comm:           dlm_thread
>     > EIP: 0061:[<c02f5e57>] CPU: 0
>     > EIP is at _spin_lock+0x7/0x10
>     >  EFLAGS: 00000286    Not tainted  (2.6.16-xen-domU #1)
>     > EAX: cd9bfac8 EBX: cd9bfac8 ECX: cb69520c EDX: cb695200
>     > ESI: cd9bfaa4 EDI: 00000000 EBP: cb695214 DS: 007b ES: 007b
>     > CR0: 8005003b CR2: b7eec83c CR3: 00e76000 CR4: 00000660
>     >  [<d1134e0c>] dlm_thread+0x26c/0x11f0 [ocfs2_dlm]
>     >  [<c0133720>] kthread+0xc0/0x110
>     >  [<c0133960>] autoremove_wake_function+0x0/0x60
>     >  [<c0133734>] kthread+0xd4/0x110
>     >  [<d1134ba0>] dlm_thread+0x0/0x11f0 [ocfs2_dlm]
>     >  [<c0133660>] kthread+0x0/0x110
>     >  [<c0102bd5>] kernel_thread_helper+0x5/0x10
>     >
>     >
>     ------------------------------------------------------------------------
>     >
>     > _______________________________________________
>     > Ocfs2-users mailing list
>     > Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com>
>     > http://oss.oracle.com/mailman/listinfo/ocfs2-users
>     >
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   



More information about the Ocfs2-users mailing list