[rds-devel] Re: trying to reproduce the crash
Or Gerlitz
ogerlitz at voltaire.com
Mon Feb 4 05:26:05 PST 2008
Olaf Kirch wrote:
> On Monday 04 February 2008 10:19, Or Gerlitz wrote:
>> This does not really work for me, as after 2-3 iterations I get the
>> below connect() error and then rmmod also fails, I guess that some of
>> the worker processes are still using the RDS socket, etc, any idea what
>> should I do to let this work?
> If that happens, insert (after the rds-stress) something like this:
> if killall -TERM rds-stress; then sleep 2; fi
OK, this enabled the runs for few more iterations before I am getting
into a situation where many rds-processes are in the D state, so doing
$ echo t > /proc/sysrq-trigger
I see things like,
> rds-stress D ffff810001065780 0 22128 1 22129 22127 (L-TLB)
> ffff81005d4dfb48 0000000000000046 00000000000000d0 0000000000000009
> ffff810064f49308 ffff810064f490c0 ffff810037f87100 0009cbc64f7b84d1
> 00000000004f9854 000000070000000e
> Call Trace: <ffffffff802d9e81>{__mutex_lock_slowpath+93}
> <ffffffff802d9ecb>{.text.lock.mutex+15} <ffffffff882be173>{:rds:rds_ib_flush_mr_pool+86}
> <ffffffff882b9414>{:rds:rds_mr_put+20} <ffffffff882b981c>{:rds:rds_rdma_drop_keys+15}
> <ffffffff882b568d>{:rds:rds_release+104} <ffffffff80275a2b>{sock_release+25}
> <ffffffff802763e6>{sock_close+44} <ffffffff8018244e>{__fput+174}
> <ffffffff8017fb8b>{filp_close+89} <ffffffff80134384>{put_files_struct+108}
> <ffffffff8013542f>{do_exit+641} <ffffffff80135ade>{sys_exit_group+0}
> <ffffffff8013e4bd>{get_signal_to_deliver+1374} <ffffffff8010a12f>{do_signal+109}
> <ffffffff8013ad8f>{lock_timer_base+27} <ffffffff8013ae01>{try_to_del_timer_sync+81}
> <ffffffff8013ae16>{del_timer_sync+12} <ffffffff80192eb7>{poll_freewait+64}
> <ffffffff801931dd>{do_sys_poll+794} <ffffffff801939af>{__pollwait+0}
> <ffffffff8010adc7>{sysret_signal+28} <ffffffff8010b04b>{ptregscall_common+103}
also I now see plenty of the following prints at the client side
> ib_mthca 0000:03:00.0: SW2HW_MPT returned status 0x0a
> RDS/IB: rds_ib_setup_qp failed (-22)
> rds_ib_conn_shutdown: failed to disconnect, cm: ffff8100626ac600 err -22
my system is SLES10 SP1 and not RH5 as yours
Or
More information about the rds-devel
mailing list