[rds-devel] Re: one_use.tgz (rds-stress.c + one_use.patch for rds driver).

Richard Frank richard.frank at oracle.com
Wed Jan 9 19:25:39 PST 2008


Richard Frank wrote:
> One minor fix in __rds_free_mr ()
>
> Move rds_mr_put(rs,mr) under trans_private test....otherwise we always 
> auto free the mr.
>
>        if (trans_private) {
>                mr->r_trans->free_mr(trans_private,
>                        invalidate, mr->r_sg, mr->r_nents);
>                rds_mr_put(rs, mr);
>        }
>
> Richard Frank wrote:
>> I reduced the FMR pool size to 2k (was 16k) and no longer get this 
>> crash ?
>>
>> Perhaps there are limitations on the pool sizes for different HCAs / 
>> firmware versions ?
>>
>> Richard Frank wrote:
>>> applying these patches to OFED-1.3-20080107-0600
>>>
>>> git://git.openfabrics.org/ofed_1_3/linux-2.6.git ofed_kernel
>>> commit 5c2b6d5ee97ebb96362048935f0780a7d772274e
>>>
>>> test crashes both nodes running basic rds-stress test (not rdma).
>>>
>>> There was one complaint in message.c during patch - which I manually 
>>> applied - perhaps that's the problem.
>>>
>>> [root at vosib6 ofa_kernel-1.3]# more net/rds/message.c.rej
>>> ***************
>>> *** 37,42 ****
>>>
>>>  static unsigned int   rds_exthdr_size[__RDS_EXTHDR_MAX] = {
>>>  [RDS_EXTHDR_NONE]     = 0,
>>>  [RDS_EXTHDR_RDMA]     = sizeof(struct rds_ext_header_rdma),
>>>  };
>>>
>>> --- 37,43 ----
>>>
>>>  static unsigned int   rds_exthdr_size[__RDS_EXTHDR_MAX] = {
>>>  [RDS_EXTHDR_NONE]     = 0,
>>> + [RDS_EXTHDR_VERSION]  = sizeof(struct rds_ext_header_version),
>>>  [RDS_EXTHDR_RDMA]     = sizeof(struct rds_ext_header_rdma),
>>>  };
>>>
>>> Jan  9 21:12:18 vosib6 kernel: Unable to handle kernel NULL pointer 
>>> dereference at virtual address 00000014
>>> Jan  9 21:12:18 vosib6 kernel:  printing eip:
>>> Jan  9 21:12:18 vosib6 kernel: fab60541
>>> Jan  9 21:12:18 vosib6 kernel: *pde = 33fe6001
>>> Jan  9 21:12:18 vosib6 kernel: Oops: 0000 [#1]
>>> Jan  9 21:12:18 vosib6 kernel: SMP
>>> Jan  9 21:12:18 vosib6 kernel: Modules linked in: rds(U) rdma_ucm(U) 
>>> rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoi\
>>> b(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_umad(U) ib_ucm(U) 
>>> ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_c\
>>> ore(U) nfsd exportfs parport_pc lp parport autofs4 i2c_dev i2c_core 
>>> nfs lockd nfs_acl sunrpc dm_mirror dm_m\
>>> ultipath dm_mod button battery ac uhci_hcd ehci_hcd hw_random shpchp 
>>> md5 ipv6 e1000 floppy ata_piix libata \
>>> sg ext3 jbd aic79xx sd_mod scsi_mod
>>> Jan  9 21:12:18 vosib6 kernel: CPU:    3
>>> Jan  9 21:12:18 vosib6 kernel: EIP:    0060:[<fab60541>]    Not 
>>> tainted VLI
>>> Jan  9 21:12:18 vosib6 kernel: EFLAGS: 00010246   
>>> (2.6.9-67.0.0.0.1.ELsmp)
>>> Jan  9 21:12:18 vosib6 kernel: EIP is at rds_ib_setup_qp+0x1d/0x219 
>>> [rds]
>>> Jan  9 21:12:18 vosib6 kernel: eax: 00000000   ebx: e1571efc   ecx: 
>>> f6219064   edx: 00000200
>>> Jan  9 21:12:18 vosib6 kernel: esi: f7394400   edi: f4b3ce00   ebp: 
>>> e1571d8c   esp: f04d3ec8
>>> Jan  9 21:12:18 vosib6 kernel: ds: 007b   es: 007b   ss: 0068
>>> Jan  9 21:12:18 vosib6 kernel: Process rdma_cm (pid: 22898, 
>>> threadinfo=f04d3000 task=f65031b0)
>>> Jan  9 21:12:18 vosib6 kernel: Stack: 00000000 00000000 f04d3ef8 
>>> 00000003 c3654760 c3654760 c3653d80 c36547\
>>> 60
>>> Jan  9 21:12:18 vosib6 kernel:        f04d3f08 f7f20800 f7f205b0 
>>> c3653d80 f6582080 f7f20720 c3831fbc c02d64\
>>> 6d
>>> Jan  9 21:12:18 vosib6 kernel:        f04d3f68 e1571efc e1571d8c 
>>> f4b3ce00 f4b3ce00 fab60888 2f085740 f0
>>>
>>> Olaf Kirch wrote:
>>>> Here we go - latest set of patches attached.
>>>>
>>>> New stuff:
>>>>  -    RDS extension headers. Rather than stuffing more and more
>>>>     things into the header, I decided to reserve 16 bytes
>>>>     for "extensions" and write the plumbing for it.
>>>>  -    RDMA extension header. Goes with every SEND following an
>>>>     RDMA operation, and contains the R_Key.
>>>>     This is used by the receiver to check for (and release)
>>>>     MRs marked as use_once
>>>>  -    Changed GET_MR interface - got rid of phys_addr, and
>>>>     added use_once.
>>>>     The phys_addr stuff needs more cleanup
>>>>  -    Version extension header. We now broadcast our supported
>>>>     RDS protocol version as part of the initial CONG_MAP update.
>>>>     This doesn't do much right now, but will be needed to
>>>>     do rolling updates in the future.
>>>>     I added this stuff now, so that we don't have to rely
>>>>     on advanced crystal balling later when the time comes
>>>>     where we break the protocol.
>>>>
>>>> Vlad, you mentioned that it's possible to crash the RDS stack by
>>>> telnetting to the RDS TCP port. I was unable to reproduce this -
>>>> what did you do to trigger the crash?
>>>>
>>>> Olaf
>>>>   
>>>
>>
>



More information about the rds-devel mailing list