[Ocfs2-users] Random Crash - Diagnosing

Tue Sep 11 08:59:52 PDT 2007

Submitted as bug 918.  Thank you for your assistance.

(Posting information here in case anyone else has seen the issue.)

Cheers,
   Matthew

---
Matthew E. Porter
Contegix
Beyond Managed Hosting(r) for Your Enterprise

On Sep 11, 2007, at 10:49 AM, Sunil Mushran wrote:

> Please log a bugzilla with this output alongwith all the version
> numbers. Kernel/ocfs2/distro
>
> Matthew E. Porter wrote:
>> Sunil:
>>   We have seen some similar errors in bugzilla.  Specifically,  
>> what we are seeing is:
>>
>> Sep 10 09:15:34 sulu kernel: BUG: soft lockup detected on  CPU#2!
>> Sep 10 09:15:34 sulu kernel:  [<c0447f3f>] softlockup_tick +0x98/0xa6
>> Sep 10 09:15:34 sulu kernel:  [<c042d138>]  update_process_times 
>> +0x39/0x5c
>> Sep 10 09:15:34 sulu kernel:  [<c04176f0>]   
>> smp_apic_timer_interrupt+0x5c/0x64
>> Sep 10 09:15:34 sulu kernel:  [<c04049bf>]  apic_timer_interrupt 
>> +0x1f/0x24
>> Sep 10 09:15:34 sulu kernel:  [<c041c774>] kmap_atomic +0xb5/0xbb
>> Sep 10 09:15:34 sulu kernel:  [<c046cd92>]  cont_prepare_write 
>> +0xd4/0x21d
>> Sep 10 09:15:34 sulu kernel:  [<f8cb12cf>]  ocfs2_prepare_write 
>> +0x150/0x19d [ocfs2]
>> Sep 10 09:15:34 sulu kernel:  [<f8cb06da>] ocfs2_get_block  
>> +0x0/0xaa5 [ocfs2]
>> Sep 10 09:15:34 sulu kernel:  [<c044ecae>]   
>> generic_file_buffered_write+0x23f/0x5f1
>> Sep 10 09:15:34 sulu kernel:  [<f8856ac0>]  do_get_write_access 
>> +0x43a/0x467 [jbd]
>> Sep 10 09:15:34 sulu kernel:  [<c0427f65>] current_fs_time +0x4a/0x55
>> Sep 10 09:15:34 sulu kernel:  [<c044f506>]   
>> __generic_file_aio_write_nolock+0x4a6/0x52a
>> Sep 10 09:15:34 sulu kernel:  [<f8cbebb2>]  ocfs2_extend_file 
>> +0xf0d/0xf95 [ocfs2]
>> Sep 10 09:15:34 sulu kernel:  [<c044f831>]   
>> generic_file_aio_write_nolock+0x39/0x83
>> Sep 10 09:15:34 sulu kernel:  [<c044fbb4>]   
>> generic_file_write_nolock+0x86/0x9a
>> Sep 10 09:15:34 sulu kernel:  [<f8ccd226>]   
>> ocfs2_write_lock_maybe_extend+0xd39/0xe03 [ocfs2]
>> Sep 10 09:15:34 sulu kernel:  [<c04352dd>]   
>> autoremove_wake_function+0x0/0x2d
>> Sep 10 09:15:34 sulu kernel:  [<f8cbf190>] ocfs2_file_write  
>> +0x189/0x22c [ocfs2]
>> Sep 10 09:15:34 sulu kernel:  [<f8cbf007>] ocfs2_file_write  
>> +0x0/0x22c [ocfs2]
>> Sep 10 09:15:34 sulu kernel:  [<c0469af3>] vfs_write +0xa1/0x143
>> Sep 10 09:15:34 sulu kernel:  [<c046a0e5>] sys_write+0x3c/ 0x63
>> Sep 10 09:15:34 sulu kernel:  [<c0403eff>] syscall_call +0x7/0xb
>>
>>   This happens on all nodes.  The CPU# and timestamp change, but  
>> the problem persists.  The systems do not restart or panic.  The  
>> system merely puts every process accessing the OCFS volume in a D  
>> state.
>>
>>   Would you still like me to log another bugzilla issue?  I am  
>> happy to do such if you wish.
>>
>>
>> Cheers,
>>   Matthew
>>
>>
>> ---
>> Matthew E. Porter
>> Contegix
>> Beyond Managed Hosting(r) for Your Enterprise
>>
>>
>>
>> On Sep 7, 2007, at 12:49 PM, Sunil Mushran wrote:
>>
>>> A bugzilla with the oops stack trace will help.
>>>
>>> Matthew E. Porter wrote:
>>>> Greetings, I am looking for a good way to diagnose random  
>>>> crashes that are occurring with one of our OCFS clusters.  It is  
>>>> a simple 2 node cluster.  debugfs does not seem to indicate any  
>>>> issues.
>>>>
>>>> (Also, I would be happy to find a consultant/freelancer to work  
>>>> through this.)
>>>>
>>>>
>>>> Cheers,
>>>>   Matthew
>>>>
>>>> ---
>>>> Matthew E. Porter
>>>> Contegix
>>>> Beyond Managed Hosting(r) for Your Enterprise
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>>
>