[Ocfs2-users] Random Crash - Diagnosing
Matthew E. Porter
matthew.porter at contegix.com
Tue Sep 11 08:59:52 PDT 2007
Submitted as bug 918. Thank you for your assistance.
(Posting information here in case anyone else has seen the issue.)
Cheers,
Matthew
---
Matthew E. Porter
Contegix
Beyond Managed Hosting(r) for Your Enterprise
On Sep 11, 2007, at 10:49 AM, Sunil Mushran wrote:
> Please log a bugzilla with this output alongwith all the version
> numbers. Kernel/ocfs2/distro
>
> Matthew E. Porter wrote:
>> Sunil:
>> We have seen some similar errors in bugzilla. Specifically,
>> what we are seeing is:
>>
>> Sep 10 09:15:34 sulu kernel: BUG: soft lockup detected on CPU#2!
>> Sep 10 09:15:34 sulu kernel: [<c0447f3f>] softlockup_tick +0x98/0xa6
>> Sep 10 09:15:34 sulu kernel: [<c042d138>] update_process_times
>> +0x39/0x5c
>> Sep 10 09:15:34 sulu kernel: [<c04176f0>]
>> smp_apic_timer_interrupt+0x5c/0x64
>> Sep 10 09:15:34 sulu kernel: [<c04049bf>] apic_timer_interrupt
>> +0x1f/0x24
>> Sep 10 09:15:34 sulu kernel: [<c041c774>] kmap_atomic +0xb5/0xbb
>> Sep 10 09:15:34 sulu kernel: [<c046cd92>] cont_prepare_write
>> +0xd4/0x21d
>> Sep 10 09:15:34 sulu kernel: [<f8cb12cf>] ocfs2_prepare_write
>> +0x150/0x19d [ocfs2]
>> Sep 10 09:15:34 sulu kernel: [<f8cb06da>] ocfs2_get_block
>> +0x0/0xaa5 [ocfs2]
>> Sep 10 09:15:34 sulu kernel: [<c044ecae>]
>> generic_file_buffered_write+0x23f/0x5f1
>> Sep 10 09:15:34 sulu kernel: [<f8856ac0>] do_get_write_access
>> +0x43a/0x467 [jbd]
>> Sep 10 09:15:34 sulu kernel: [<c0427f65>] current_fs_time +0x4a/0x55
>> Sep 10 09:15:34 sulu kernel: [<c044f506>]
>> __generic_file_aio_write_nolock+0x4a6/0x52a
>> Sep 10 09:15:34 sulu kernel: [<f8cbebb2>] ocfs2_extend_file
>> +0xf0d/0xf95 [ocfs2]
>> Sep 10 09:15:34 sulu kernel: [<c044f831>]
>> generic_file_aio_write_nolock+0x39/0x83
>> Sep 10 09:15:34 sulu kernel: [<c044fbb4>]
>> generic_file_write_nolock+0x86/0x9a
>> Sep 10 09:15:34 sulu kernel: [<f8ccd226>]
>> ocfs2_write_lock_maybe_extend+0xd39/0xe03 [ocfs2]
>> Sep 10 09:15:34 sulu kernel: [<c04352dd>]
>> autoremove_wake_function+0x0/0x2d
>> Sep 10 09:15:34 sulu kernel: [<f8cbf190>] ocfs2_file_write
>> +0x189/0x22c [ocfs2]
>> Sep 10 09:15:34 sulu kernel: [<f8cbf007>] ocfs2_file_write
>> +0x0/0x22c [ocfs2]
>> Sep 10 09:15:34 sulu kernel: [<c0469af3>] vfs_write +0xa1/0x143
>> Sep 10 09:15:34 sulu kernel: [<c046a0e5>] sys_write+0x3c/ 0x63
>> Sep 10 09:15:34 sulu kernel: [<c0403eff>] syscall_call +0x7/0xb
>>
>> This happens on all nodes. The CPU# and timestamp change, but
>> the problem persists. The systems do not restart or panic. The
>> system merely puts every process accessing the OCFS volume in a D
>> state.
>>
>> Would you still like me to log another bugzilla issue? I am
>> happy to do such if you wish.
>>
>>
>> Cheers,
>> Matthew
>>
>>
>> ---
>> Matthew E. Porter
>> Contegix
>> Beyond Managed Hosting(r) for Your Enterprise
>>
>>
>>
>> On Sep 7, 2007, at 12:49 PM, Sunil Mushran wrote:
>>
>>> A bugzilla with the oops stack trace will help.
>>>
>>> Matthew E. Porter wrote:
>>>> Greetings, I am looking for a good way to diagnose random
>>>> crashes that are occurring with one of our OCFS clusters. It is
>>>> a simple 2 node cluster. debugfs does not seem to indicate any
>>>> issues.
>>>>
>>>> (Also, I would be happy to find a consultant/freelancer to work
>>>> through this.)
>>>>
>>>>
>>>> Cheers,
>>>> Matthew
>>>>
>>>> ---
>>>> Matthew E. Porter
>>>> Contegix
>>>> Beyond Managed Hosting(r) for Your Enterprise
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>>
>
More information about the Ocfs2-users
mailing list