[rds-devel] Re: [PATCH] Report proper error code in [was: trying to
reproduce the crash]
Or Gerlitz
ogerlitz at voltaire.com
Tue Feb 5 06:17:41 PST 2008
Olaf Kirch wrote:
> I fixed this, using the patch below. Now I'm making some progress:
> First, the kernel reports:
> RDS/IB: ib_alloc_fmr failed (err=-12)
> which is good - now we get a decent error code instead of a crash.
So are you somehow limiting the amount of FMRs allocated by RDS? is the
only missing piece to do that is support for max_fmr device attribute,
or some more instrumentation of the rds code is needed?
> A little later, it complains:
> ib_mthca 0000:05:00.0: SW2HW_MPT returned status 0x0a
yes I see that ass well, in my case with more prints, I guess b/c I have
set the debug_level module param of ib_mthca,
> rds_ib_conn_shutdown: failed to disconnect, cm: ffff810052452c00 err -22
> ib_mthca 0000:03:00.0: Command 0d completed with status 0a
> ib_mthca 0000:03:00.0: SW2HW_MPT returned status 0x0a
> RDS/IB: rds_ib_setup_qp failed (-22)
> which doesn't sound quite as good... and things are very hosed
> from that moment on; reloading ib_mthca seems to fix things, however.
I have applied the patch to mthca, so it allows me to run much more time
then before, however at some point, the client side (MT25204 / 1.2.0
firmware) crashes when the script attempts to probe out the rds module:
> RDS/IB: No client_data for device mthca0
> RDS/IB: rds_ib_setup_qp failed (-95)
> RDS/IB: rds_ib_setup_qp failed (-95)
> rds_ib_conn_shutdown: failed to disconnect, cm: ffff810040c7e400 err -22
> RDS/IB: rds_ib_setup_qp failed (-95)
> RDS/IB: rds_ib_setup_qp failed (-95)
> NET: Unregistered protocol family 28
> Unregistered RDS/ib transport
> Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP:
> <ffffffff88343f2c>{:rds:rds_ib_remove_one+18}
> PGD 4bfe4067 PUD 3b913067 PMD 0
> Oops: 0000 [1] SMP
> last sysfs file: /class/net/ib0/flags
> CPU 6
> Modules linked in: rds ib_mthca ib_ipoib rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfs lockd nfs_acl sunrpc autofs4 edd ipv6 af_packet thermal processor fan button battery ac apparmor aamatch_pcre loop dm_mod usbhid shpchp pci_hotplug ehci_hcd uhci_hcd tg3 usbcore ext3 jbd sr_mod cdrom i2c_i801 i2c_core sg ata_piix libata sd_mod scsi_mod
> Pid: 24445, comm: rmmod Tainted: G U 2.6.16.46-0.12-smp #1
> RIP: 0010:[<ffffffff88343f2c>] <ffffffff88343f2c>{:rds:rds_ib_remove_one+18}
> RSP: 0000:ffff81003705de88 EFLAGS: 00010296
> RAX: 0000000000000000 RBX: ffff810038025000 RCX: ffff8100380bab20
> RDX: ffff81003859e380 RSI: 0000000000000296 RDI: ffff810038025080
> RBP: ffffffff88356880 R08: ffff81003705dd78 R09: ffff810037e1a890
> R10: ffffffff8017f6e5 R11: ffffffff88356880 R12: ffffffff88356220
> R13: 0000000000000000 R14: 0000000000000880 R15: 00007fff020167d0
> FS: 00002b1ca8dda6d0(0000) GS:ffff810037f91e40(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000010 CR3: 000000005e2bd000 CR4: 00000000000006e0
> Process rmmod (pid: 24445, threadinfo ffff81003705c000, task ffff81004d9327d0)
> Stack: ffff81004194af40 ffff810038025000 ffffffff88356880 ffffffff881775b3
> ffffffff88356260 ffffffff88356880 0000000000000880 0000000000000000
> 0000000000000880 ffffffff88347d86
> Call Trace: <ffffffff881775b3>{:ib_core:ib_unregister_client+47}
> <ffffffff88347d86>{:rds:rds_trans_exit+57} <ffffffff88347b0b>{:rds:rds_exit+47}
> <ffffffff8014c984>{sys_delete_module+540} <ffffffff8016e7e7>{do_munmap+619}
> <ffffffff801f088b>{__up_write+33} <ffffffff8010ad3e>{system_call+126}
>
> Code: 48 8b 78 10 48 89 c3 48 8b 2f eb 29 48 8b 17 48 8b 47 08 48
> RIP <ffffffff88343f2c>{:rds:rds_ib_remove_one+18} RSP <ffff81003705de88>
> CR2: 0000000000000010
Or
More information about the rds-devel
mailing list