[Ocfs2-users] Kernel Panic, Server not coming back up

kevin at utahsysadmin.com kevin at utahsysadmin.com
Mon Apr 5 13:58:33 PDT 2010


Sunil,

Thanks for the response.  Could this be triggered by both servers trying to
write to the same log file at the same time or can OCFS2 handle that
situation?  Or is this simply that the VMWare software is not able to
handle the amount of IO?  I don't think these servers are under a lot of
load.

I also forgot to mention that on nodes 1 & 2, I have the /data partitions
mounted read only.  Will that cause any problems?

Thanks for your time.

Kevin

On Mon, 05 Apr 2010 13:45:59 -0700, Sunil Mushran
<sunil.mushran at oracle.com> wrote:
> It is having problems doing ios to the virtual devices. -5 is EIO.
> 
> kevin at utahsysadmin.com wrote:
>>
<snip>
>> end_request:  I/O error, dev sdc, sector 585159
>> Aborting journal on device sdc1
>> end_request:  I/O error, dev sdc, sector 528151
>> Buffer I/O error on device sdc1,  logical block 66011
>> lost page write due to I/O error on sdc1
>> (2848,1):ocfs2_start_trans:240 ERROR: status = -30
>> OCFS2: abort (device sdc1): ocfs2_start_trans: Detected aborted journal
>> Kernel panic - not syncing: OCFS2:  (device sdc1): panic forced after
> error
>>
>>  <0>Rebooting in 30 seconds..BUG: warning at
>> arch/i386/kernel/smp.c:492/smp_send_reschedule() (Tainted: G    )
>>
<snip>
>> This is the latest from one of the alive hosts:
>>
>> # dmesg | tail -50
>> (2869,0):ocfs2_lock_allocators:677 ERROR: status = -5
>> (2869,0):__ocfs2_extend_allocation:739 ERROR: status = -5
>> (2869,0):ocfs2_extend_no_holes:952 ERROR: status = -5
>> (2869,0):ocfs2_expand_nonsparse_inode:1678 ERROR: status = -5
>> (2869,0):ocfs2_write_begin_nolock:1722 ERROR: status = -5
>> (2869,0):ocfs2_write_begin:1860 ERROR: status = -5
>> (2869,0):ocfs2_file_buffered_write:2039 ERROR: status = -5
>> (2869,0):__ocfs2_file_aio_write:2194 ERROR: status = -5
>> OCFS2: ERROR (device sdc1): ocfs2_check_group_descriptor: Group
> descriptor
>> # 1128960 has bit count 32256 but claims that 34300 are free
>> (2881,0):ocfs2_search_chain:1244 ERROR: status = -5
>> (2881,0):ocfs2_claim_suballoc_bits:1433 ERROR: status = -5
>> (2881,0):__ocfs2_claim_clusters:1715 ERROR: status = -5
>> (2881,0):ocfs2_local_alloc_new_window:1013 ERROR: status = -5
>> (2881,0):ocfs2_local_alloc_slide_window:1116 ERROR: status = -5
>> (2881,0):ocfs2_reserve_local_alloc_bits:537 ERROR: status = -5
>> (2881,0):__ocfs2_reserve_clusters:725 ERROR: status = -5
>> (2881,0):ocfs2_lock_allocators:677 ERROR: status = -5
>> (2881,0):__ocfs2_extend_allocation:739 ERROR: status = -5
>> (2881,0):ocfs2_extend_no_holes:952 ERROR: status = -5
>> (2881,0):ocfs2_expand_nonsparse_inode:1678 ERROR: status = -5
>> (2881,0):ocfs2_write_begin_nolock:1722 ERROR: status = -5
>> (2881,0):ocfs2_write_begin:1860 ERROR: status = -5
>> (2881,0):ocfs2_file_buffered_write:2039 ERROR: status = -5
>> (2881,0):__ocfs2_file_aio_write:2194 ERROR: status = -5
>> (2045,0):o2net_connect_expired:1664 ERROR: no connection established
> with
>> node 2 after 30.0 seconds, giving up and returning errors.
>> OCFS2: ERROR (device sdc1): ocfs2_check_group_descriptor: Group
> descriptor
>> # 1128960 has bit count 32256 but claims that 34300 are free
>> (2872,0):ocfs2_search_chain:1244 ERROR: status = -5
>> (2872,0):ocfs2_claim_suballoc_bits:1433 ERROR: status = -5
>> (2872,0):__ocfs2_claim_clusters:1715 ERROR: status = -5
>> (2872,0):ocfs2_local_alloc_new_window:1013 ERROR: status = -5
>> (2872,0):ocfs2_local_alloc_slide_window:1116 ERROR: status = -5
>> (2872,0):ocfs2_reserve_local_alloc_bits:537 ERROR: status = -5
>> (2872,0):__ocfs2_reserve_clusters:725 ERROR: status = -5
>> (2872,0):ocfs2_lock_allocators:677 ERROR: status = -5
>> (2872,0):__ocfs2_extend_allocation:739 ERROR: status = -5
>> (2872,0):ocfs2_extend_no_holes:952 ERROR: status = -5
>> (2872,0):ocfs2_expand_nonsparse_inode:1678 ERROR: status = -5
>> (2872,0):ocfs2_write_begin_nolock:1722 ERROR: status = -5
>> (2872,0):ocfs2_write_begin:1860 ERROR: status = -5
>> (2872,0):ocfs2_file_buffered_write:2039 ERROR: status = -5
>> (2872,0):__ocfs2_file_aio_write:2194 ERROR: status = -5
>> (2065,0):ocfs2_dlm_eviction_cb:98 device (8,33): dlm has evicted node 2
>> (12701,1):dlm_get_lock_resource:844
>> 6A03E81A818641A68FD8DC23854E12D3:M00000000000000000000243568d3c5: at
> least
>> one node (2) to recover before lock mastery can begin
>> (2045,0):ocfs2_dlm_eviction_cb:98 device (8,33): dlm has evicted node 2
>> (12701,1):dlm_get_lock_resource:898
>> 6A03E81A818641A68FD8DC23854E12D3:M00000000000000000000243568d3c5: at
> least
>> one node (2) to recover before lock mastery can begin
>> o2net: accepted connection from node qa-web2 (num 2) at
> 147.178.220.32:7777
>> ocfs2_dlm: Node 2 joins domain 6A03E81A818641A68FD8DC23854E12D3
>> ocfs2_dlm: Nodes in domain ("6A03E81A818641A68FD8DC23854E12D3"): 0 1 2 
>> (12701,1):dlm_restart_lock_mastery:1216 node 2 up while restarting
>> (12701,1):dlm_wait_for_lock_mastery:1040 ERROR: status = -11
>>




More information about the Ocfs2-users mailing list