[rds-devel] [git pull] more teardown bug fixes

Chris Mason chris.mason at oracle.com
Wed Aug 4 05:55:57 PDT 2010


On Tue, Jul 27, 2010 at 11:00:55AM -0700, Zach Brown wrote:
> > Yep, look for rds_rdma module holding references and not unloading, even after all sockets are closed.
> 
> Yeah, sorry, I missed the other rds_get_preferred() caller.  As discussed, just toss this into the patch and all's well!

With this change I'm able to rmmod after a big rds-stress load.  I've
gotten two crashes though, the first happened overnight and the console
wasn't logging (sigh), and the second was something I've never seen
before.  I hadn't done rmmod at all, it just boomed after I ran the
rds-stress load.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa00f5209>] mlx4_map_phys_fmr_fbo+0x71/0x143 [mlx4_core]
PGD 1fc1d8e067 PUD 1fc8288067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu127/cache/index2/shared_cpu_map
CPU 69
Modules linked in: ipt_MASQUERADE(U) iptable_nat(U) nf_nat(U) nf_conntrack_ipv4(
U) nf_defrag_ipv4(U) xt_state(U) nf_conntrack(U) ipt_REJECT(U) xt_tcpudp(U) ipta
ble_filter(U) ip_tables(U) x_tables(U) bridge(U) stp(U) llc(U) nfsd(U) exportfs(
U) autofs4(U) hidp(U) nfs(U) fscache(U) nfs_acl(U) auth_rpcgss(U) rfcomm(U) l2cap(U) bluetooth(U) rfkill(U) lockd(U) sunrpc(U) bonding(U) iscsi_tcp(U) bnx2i(U) cnic(U) uio(U) cxgb3i(U) iw_cxgb3(U) cxgb3(U) libiscsi_tcp(U) ib_iser(U) libiscsi(U) scsi_transport_iscsi(U) ib_srp(U) scsi_transport_srp(U) scsi_tgt(U) rds_rdma(U) rds(U) ib_sdp(U) ib_ipoib(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_addr(U) ipv6(U) ib_sa(U) dm_multipath(U) video(U) output(U) sbs(U) sbshc(U) parport_pc(U) lp(U) parport(U) kvm_intel(U) kvm(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_en(U) joydev(U) ixgbe(U) mdio(U) igb(U) dca(U) snd_seq_dummy(U) mlx4_core(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) snd_mixer_oss(U) iTCO_wdt(U) snd_pcm(U) i2c_i801(U) iTCO_vendor_support(U) i2c_core(U) snd_timer(U) snd(U) soundcore(U) snd_page_alloc(U) pcspkr(U) ahci(U) shpchp(U) megaraid_sas(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) [last unloaded: microcode]
Pid: 15614, comm: rds-stress Not tainted 2.6.32-100.0.4.x86_64 #1 Sun Fire X4800
RIP: 0010:[<ffffffffa00f5209>]  [<ffffffffa00f5209>] mlx4_map_phys_fmr_fbo+0x71/0x143 [mlx4_core]
RSP: 0018:ffff881fd78699a8  EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff881fda52aee8 RCX: 000000000000000c
RDX: 000000005011507a RSI: ffff881fda52aee8 RDI: ffff881ff1920000
RBP: ffff881fd78699c8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000002 R11: ffff881fc1170520 R12: 000000007a501150
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000002
FS:  00007fa285a496e0(0000) GS:ffff8800283a0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000001fc82a6000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rds-stress (pid: 15614, threadinfo ffff881fd7868000, task ffff881fdd13e200)
Stack:
 ffff881fdae68b40 ffff889febf8e000 ffff881fe74ba540 ffff881fc1170520
<0> ffff881fd78699f8 ffffffffa00f531d ffff881f00002000 ffff881fda52aee0
<0> ffff881fda52aee4 ffff881f00000000 ffff881fd7869a18 ffffffffa016b1bf
Call Trace:
 [<ffffffffa00f531d>] mlx4_map_phys_fmr+0x42/0x44 [mlx4_core]
 [<ffffffffa016b1bf>] mlx4_ib_map_phys_fmr+0x3a/0x3c [mlx4_ib]
 [<ffffffffa0365bce>] rds_ib_get_mr+0x3e0/0x4cc [rds_rdma]
 [<ffffffff8103ae2a>] ? get_user_pages_fast+0xc2/0x168
 [<ffffffffa0345e4b>] ? kcalloc+0x35/0x3d [rds]
 [<ffffffffa03462d7>] __rds_rdma_map+0x1f4/0x32c [rds]
 [<ffffffffa0346442>] rds_cmsg_rdma_map+0x33/0x3c [rds]
 [<ffffffffa034437e>] rds_sendmsg+0x2c5/0x60d [rds]   
 [<ffffffff813910ff>] __sock_sendmsg+0x5e/0x67
 [<ffffffff813919e4>] sock_sendmsg+0xcc/0xe5
 [<ffffffff813918ff>] ? sock_recvmsg+0xcf/0xe8
 [<ffffffff810752b8>] ? autoremove_wake_function+0x0/0x3d
 [<ffffffff811083e6>] ? virt_to_head_page+0x29/0x2b   
 [<ffffffff81108406>] ? virt_to_slab+0x1e/0x2e
 [<ffffffff81108cf6>] ? __cache_free+0x44/0x1bf
 [<ffffffff8139670c>] ? sock_kmalloc+0x39/0x50
 [<ffffffff8139670c>] ? sock_kmalloc+0x39/0x50
 [<ffffffff8139670c>] ? sock_kmalloc+0x39/0x50
 [<ffffffff81391c25>] sys_sendmsg+0x228/0x2b4
 [<ffffffff810eeb9a>] ? handle_mm_fault+0x148/0x6f1   
 [<ffffffff8111f66f>] ? path_put+0x22/0x27
 [<ffffffff810a6c9b>] ? audit_syscall_entry+0x103/0x12f
 [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b  
Code: 0f 8d cb 00 00 00 e9 cd 00 00 00 44 03 a7 14 01 00 00 48 8b 45 20 44 89 e2 c1 ca 18 89 53 20 89 10 48 8b 45 18 89 10 48 8b 43 30 <c6> 00 f0 0f ae f8 31 f6

> 
> - z
> 
> diff --git a/net/rds/connection.c b/net/rds/connection.c
> index 87c4544..7a5398e 100644
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -116,6 +116,7 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
>  {
>  	struct rds_connection *conn, *parent = NULL;
>  	struct hlist_head *head = rds_conn_bucket(laddr, faddr);
> +	struct rds_transport *loop_trans;
>  	unsigned long flags;
>  	int ret;
>  
> @@ -166,7 +167,9 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
>  	 * can bind to the destination address then we'd rather the messages
>  	 * flow through loopback rather than either transport.
>  	 */
> -	if (rds_trans_get_preferred(faddr)) {
> +	loop_trans = rds_trans_get_preferred(faddr);
> +	if (loop_trans) {
> +		rds_trans_put(loop_trans);
>  		conn->c_loopback = 1;
>  		if (is_outgoing && trans->t_prefer_loopback) {
>  			/* "outgoing" connection - and the transport
> 
> 
> 
> 
> _______________________________________________
> rds-devel mailing list
> rds-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/rds-devel



More information about the rds-devel mailing list