[Ocfs2-users] Random Crash - Diagnosing

Sunil Mushran Sunil.Mushran at oracle.com
Tue Sep 11 08:49:07 PDT 2007


Please log a bugzilla with this output alongwith all the version
numbers. Kernel/ocfs2/distro

Matthew E. Porter wrote:
> Sunil:
>   We have seen some similar errors in bugzilla.  Specifically, what we 
> are seeing is:
>
> Sep 10 09:15:34 sulu kernel: BUG: soft lockup detected on  CPU#2!
> Sep 10 09:15:34 sulu kernel:  [<c0447f3f>] softlockup_tick +0x98/0xa6
> Sep 10 09:15:34 sulu kernel:  [<c042d138>]  
> update_process_times+0x39/0x5c
> Sep 10 09:15:34 sulu kernel:  [<c04176f0>]  
> smp_apic_timer_interrupt+0x5c/0x64
> Sep 10 09:15:34 sulu kernel:  [<c04049bf>]  
> apic_timer_interrupt+0x1f/0x24
> Sep 10 09:15:34 sulu kernel:  [<c041c774>] kmap_atomic +0xb5/0xbb
> Sep 10 09:15:34 sulu kernel:  [<c046cd92>]  cont_prepare_write+0xd4/0x21d
> Sep 10 09:15:34 sulu kernel:  [<f8cb12cf>]  
> ocfs2_prepare_write+0x150/0x19d [ocfs2]
> Sep 10 09:15:34 sulu kernel:  [<f8cb06da>] ocfs2_get_block +0x0/0xaa5 
> [ocfs2]
> Sep 10 09:15:34 sulu kernel:  [<c044ecae>]  
> generic_file_buffered_write+0x23f/0x5f1
> Sep 10 09:15:34 sulu kernel:  [<f8856ac0>]  
> do_get_write_access+0x43a/0x467 [jbd]
> Sep 10 09:15:34 sulu kernel:  [<c0427f65>] current_fs_time +0x4a/0x55
> Sep 10 09:15:34 sulu kernel:  [<c044f506>]  
> __generic_file_aio_write_nolock+0x4a6/0x52a
> Sep 10 09:15:34 sulu kernel:  [<f8cbebb2>]  
> ocfs2_extend_file+0xf0d/0xf95 [ocfs2]
> Sep 10 09:15:34 sulu kernel:  [<c044f831>]  
> generic_file_aio_write_nolock+0x39/0x83
> Sep 10 09:15:34 sulu kernel:  [<c044fbb4>]  
> generic_file_write_nolock+0x86/0x9a
> Sep 10 09:15:34 sulu kernel:  [<f8ccd226>]  
> ocfs2_write_lock_maybe_extend+0xd39/0xe03 [ocfs2]
> Sep 10 09:15:34 sulu kernel:  [<c04352dd>]  
> autoremove_wake_function+0x0/0x2d
> Sep 10 09:15:34 sulu kernel:  [<f8cbf190>] ocfs2_file_write 
> +0x189/0x22c [ocfs2]
> Sep 10 09:15:34 sulu kernel:  [<f8cbf007>] ocfs2_file_write +0x0/0x22c 
> [ocfs2]
> Sep 10 09:15:34 sulu kernel:  [<c0469af3>] vfs_write +0xa1/0x143
> Sep 10 09:15:34 sulu kernel:  [<c046a0e5>] sys_write+0x3c/ 0x63
> Sep 10 09:15:34 sulu kernel:  [<c0403eff>] syscall_call +0x7/0xb
>
>   This happens on all nodes.  The CPU# and timestamp change, but the 
> problem persists.  The systems do not restart or panic.  The system 
> merely puts every process accessing the OCFS volume in a D state.
>
>   Would you still like me to log another bugzilla issue?  I am happy 
> to do such if you wish.
>
>
> Cheers,
>   Matthew
>
>
> ---
> Matthew E. Porter
> Contegix
> Beyond Managed Hosting(r) for Your Enterprise
>
>
>
> On Sep 7, 2007, at 12:49 PM, Sunil Mushran wrote:
>
>> A bugzilla with the oops stack trace will help.
>>
>> Matthew E. Porter wrote:
>>> Greetings, I am looking for a good way to diagnose random crashes 
>>> that are occurring with one of our OCFS clusters.  It is a simple 2 
>>> node cluster.  debugfs does not seem to indicate any issues.
>>>
>>> (Also, I would be happy to find a consultant/freelancer to work 
>>> through this.)
>>>
>>>
>>> Cheers,
>>>   Matthew
>>>
>>> ---
>>> Matthew E. Porter
>>> Contegix
>>> Beyond Managed Hosting(r) for Your Enterprise
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>




More information about the Ocfs2-users mailing list