[rds-devel] Re: trying to reproduce the crash

Or Gerlitz ogerlitz at voltaire.com
Mon Feb 4 05:26:05 PST 2008


Olaf Kirch wrote:
> On Monday 04 February 2008 10:19, Or Gerlitz wrote:

>> This does not really work for me, as after 2-3 iterations I get the 
>> below connect() error and then rmmod also fails, I guess that some of 
>> the worker processes are still using the RDS socket, etc, any idea what 
>> should I do to let this work?

> If that happens, insert (after the rds-stress) something like this:
> 	if killall -TERM rds-stress; then sleep 2; fi

OK, this enabled the runs for few more iterations before I am getting 
into a situation where many rds-processes are in the D state, so doing

$ echo t > /proc/sysrq-trigger

I see things like,
> rds-stress    D ffff810001065780     0 22128      1         22129 22127 (L-TLB)
> ffff81005d4dfb48 0000000000000046 00000000000000d0 0000000000000009
>        ffff810064f49308 ffff810064f490c0 ffff810037f87100 0009cbc64f7b84d1
>        00000000004f9854 000000070000000e
> Call Trace: <ffffffff802d9e81>{__mutex_lock_slowpath+93}
>        <ffffffff802d9ecb>{.text.lock.mutex+15} <ffffffff882be173>{:rds:rds_ib_flush_mr_pool+86}
>        <ffffffff882b9414>{:rds:rds_mr_put+20} <ffffffff882b981c>{:rds:rds_rdma_drop_keys+15}
>        <ffffffff882b568d>{:rds:rds_release+104} <ffffffff80275a2b>{sock_release+25}
>        <ffffffff802763e6>{sock_close+44} <ffffffff8018244e>{__fput+174}
>        <ffffffff8017fb8b>{filp_close+89} <ffffffff80134384>{put_files_struct+108}
>        <ffffffff8013542f>{do_exit+641} <ffffffff80135ade>{sys_exit_group+0}
>        <ffffffff8013e4bd>{get_signal_to_deliver+1374} <ffffffff8010a12f>{do_signal+109}
>        <ffffffff8013ad8f>{lock_timer_base+27} <ffffffff8013ae01>{try_to_del_timer_sync+81}
>        <ffffffff8013ae16>{del_timer_sync+12} <ffffffff80192eb7>{poll_freewait+64}
>        <ffffffff801931dd>{do_sys_poll+794} <ffffffff801939af>{__pollwait+0}
>        <ffffffff8010adc7>{sysret_signal+28} <ffffffff8010b04b>{ptregscall_common+103}

also I now see plenty of the following prints at the client side
> ib_mthca 0000:03:00.0: SW2HW_MPT returned status 0x0a
> RDS/IB: rds_ib_setup_qp failed (-22)
> rds_ib_conn_shutdown: failed to disconnect, cm: ffff8100626ac600 err -22

my system is SLES10 SP1 and not RH5 as yours

Or






More information about the rds-devel mailing list