[rds-devel] Re: trying to reproduce the crash
Or Gerlitz
ogerlitz at voltaire.com
Sun Feb 3 23:56:10 PST 2008
Olaf Kirch wrote:
> Okay, so here's what I do. On one side (call it host_a), I do something like
> while true; do
> rds-stress -R -r $host_a -p 4000
> done
>
> On the other side (host_b) I do this:
>
> while sleep 1; do
> rmmod rds
> sleep 1
> insmod rds.ko
> rds-stress -R -r $host_b -s $host_a -p 4000 -c -d4 -t32 -T3 -D64k
> done
>
> This reproduces the crash in 5-20 minutes. It's possible that this requires SMP
> machines on both ends; this setup has a 4-node and a 2-node machine, both 64bit
> Xeons, and both with 1GB of RAM.
Okay, I will try the exact scripts, also you use -R and -c on the client
side which I don't, so will add them as well. My nodes has two CPUs,
each with four cores and 2GB RAM, so its somehow different configuration
in that respect.
>> and also reports on wrong sequence number in the client side
>>> An incoming message had a header which
>>> didn't contain the fields we expected:
>>> member expected eq got
>>> seq 14 != 15
>>> header from 192.168.10.85:4003 to id 4044 bogus
> That is a symptom of RDMA operations getting dropped on the floor, usually when
> the connection is dropped and re-established. Is there a message in syslog
> that coincides with this?
not sure to follow here, you mean a message from the rds kernel module?
there are plenty of messages on QP error 3, recv completion with error
10,5,4, etc, do you mean to these messages?
Or.
More information about the rds-devel
mailing list