[rds-devel] Re: [PATCH] Report proper error code in [was: trying to reproduce the crash]

Olaf Kirch olaf.kirch at oracle.com
Wed Feb 6 08:37:49 PST 2008


On Tuesday 05 February 2008 15:17, Or Gerlitz wrote:
> So are you somehow limiting the amount of FMRs allocated by RDS? is the 
> only missing piece to do that is support for max_fmr device attribute, 
> or some more instrumentation of the rds code is needed?

I had to add some code to enforce a limit. The original plan was to just
use ib_alloc_fmr and handle the errors it returns (expecting it to give us
-ENOBUFS or some such when it hits some intrinsic FMR limit). Alas it turns
out we confuse the driver when we do this, so now I'm back to enforcing a limit
inside RDS.

> > RDS/IB: No client_data for device mthca0
> > RDS/IB: rds_ib_setup_qp failed (-95)
> > RDS/IB: rds_ib_setup_qp failed (-95)
> > rds_ib_conn_shutdown: failed to disconnect, cm: ffff810040c7e400 err -22
> > RDS/IB: rds_ib_setup_qp failed (-95)
> > RDS/IB: rds_ib_setup_qp failed (-95)
> > NET: Unregistered protocol family 28
> > Unregistered RDS/ib transport
> > Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP: 
> > <ffffffff88343f2c>{:rds:rds_ib_remove_one+18}

Ick! That's not good. "No client_data for device mthca0" means that we failed
to attach our own client data to the device. This happens in rds_ib_add_one,
and if that fails it usually means there was an error allocating the PD
or MR.

> > PGD 4bfe4067 PUD 3b913067 PMD 0 
> > Oops: 0000 [1] SMP 
> > last sysfs file: /class/net/ib0/flags
> > CPU 6 
> > Modules linked in: rds ib_mthca ib_ipoib rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfs lockd nfs_acl sunrpc autofs4 edd ipv6 af_packet thermal processor fan button battery ac apparmor aamatch_pcre loop dm_mod usbhid shpchp pci_hotplug ehci_hcd uhci_hcd tg3 usbcore ext3 jbd sr_mod cdrom i2c_i801 i2c_core sg ata_piix libata sd_mod scsi_mod
> > Pid: 24445, comm: rmmod Tainted: G     U 2.6.16.46-0.12-smp #1
> > RIP: 0010:[<ffffffff88343f2c>] <ffffffff88343f2c>{:rds:rds_ib_remove_one+18}
[...]
> > Call Trace: <ffffffff881775b3>{:ib_core:ib_unregister_client+47}
> >        <ffffffff88347d86>{:rds:rds_trans_exit+57} <ffffffff88347b0b>{:rds:rds_exit+47}
> >        <ffffffff8014c984>{sys_delete_module+540} <ffffffff8016e7e7>{do_munmap+619}
> >        <ffffffff801f088b>{__up_write+33} <ffffffff8010ad3e>{system_call+126}

And that shouldn't happen. rds_ib_remove_one trips over the missing
client data.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax



More information about the rds-devel mailing list