[rds-devel] Re: trying to reproduce the crash
Or Gerlitz
ogerlitz at voltaire.com
Sun Feb 3 03:02:13 PST 2008
> Olaf Kirch wrote:
> a. when running with -d4 -t64 -T3 -D64k, I often get a crash
> in ib_fmr_map_phys - always in the same location in mthca_arbel_map_phys.
Hi Olaf,
I did some rds-stress runs as above over the device you are using
MT25204 (but with the latest firmware 1.2.0) both as client and server
and I don't manage to reproduce a crash. The code is ofed 1.3 rc3, the
second node is connectx. So, it would be best if you can share a script
that when running with the crash is reproduced.
Or.
Other then not crashing, I did see some problems, specifically, atomic
order zero page allocation failure in rds_ib_recv_refill
> The following is only an harmless informational message.
> Unless you get a _continuous_flood_ of these messages it means
> everything is working fine. Allocations from irqs cannot be
> perfectly reliable and the kernel is designed to handle that.
> kswapd0: page allocation failure. order:0, mode:0x22
>
> Call Trace: <IRQ> <ffffffff8016334b>{__alloc_pages+727}
> <ffffffff8837c207>{:rds:rds_ib_recv_refill+232} <ffffffff8837cc08>{:rds:rds_ib_recv_cq_comp_handler+1647}
> <ffffffff88147a91>{:mlx4_core:mlx4_eq_int+44} <ffffffff88147c92>{:mlx4_core:mlx4_msi_x_interrupt+15}
> <ffffffff80137767>{__do_softirq+95} <ffffffff8015afbc>{handle_IRQ_event+41}
> <ffffffff8015b086>{__do_IRQ+153} <ffffffff8010d430>{do_IRQ+59}
> <ffffffff8010b25a>{ret_from_intr+0} <EOI> <ffffffff802da558>{_spin_lock+3}
> <ffffffff801716f8>{page_check_address+173} <ffffffff80171eda>{page_referenced_one+97}
> <ffffffff80172230>{page_referenced+135} <ffffffff80172217>{page_referenced+110}
> <ffffffff80166776>{shrink_zone+562} <ffffffff801677d0>{balance_pgdat+530}
> <ffffffff8016819e>{kswapd+308} <ffffffff80145aa2>{autoremove_wake_function+0}
> <ffffffff8010bdce>{child_rip+8} <ffffffff8016806a>{kswapd+0}
> <ffffffff8010bdc6>{child_rip+0}
and also reports on wrong sequence number in the client side
> An incoming message had a header which
> didn't contain the fields we expected:
> member expected eq got
> seq 14 != 15
> from_addr 192.168.10.85 = 192.168.10.85
> from_port 4003 = 4003
> to_addr 192.168.10.85 = 192.168.10.85
> to_port 4044 = 4044
> index 3 = 3
> op 1 = 1
> header from 192.168.10.85:4003 to id 4044 bogus
More information about the rds-devel
mailing list