[rds-devel] Re: one_use.tgz (rds-stress.c + one_use.patch for rds
driver).
Richard Frank
richard.frank at oracle.com
Wed Jan 9 19:25:02 PST 2008
Richard Frank wrote:
> applying these patches to OFED-1.3-20080107-0600
>
> git://git.openfabrics.org/ofed_1_3/linux-2.6.git ofed_kernel
> commit 5c2b6d5ee97ebb96362048935f0780a7d772274e
>
> test crashes both nodes running basic rds-stress test (not rdma).
>
> There was one complaint in message.c during patch - which I manually
> applied - perhaps that's the problem.
>
> [root at vosib6 ofa_kernel-1.3]# more net/rds/message.c.rej
> ***************
> *** 37,42 ****
>
> static unsigned int rds_exthdr_size[__RDS_EXTHDR_MAX] = {
> [RDS_EXTHDR_NONE] = 0,
> [RDS_EXTHDR_RDMA] = sizeof(struct rds_ext_header_rdma),
> };
>
> --- 37,43 ----
>
> static unsigned int rds_exthdr_size[__RDS_EXTHDR_MAX] = {
> [RDS_EXTHDR_NONE] = 0,
> + [RDS_EXTHDR_VERSION] = sizeof(struct rds_ext_header_version),
> [RDS_EXTHDR_RDMA] = sizeof(struct rds_ext_header_rdma),
> };
>
> Jan 9 21:12:18 vosib6 kernel: Unable to handle kernel NULL pointer
> dereference at virtual address 00000014
> Jan 9 21:12:18 vosib6 kernel: printing eip:
> Jan 9 21:12:18 vosib6 kernel: fab60541
> Jan 9 21:12:18 vosib6 kernel: *pde = 33fe6001
> Jan 9 21:12:18 vosib6 kernel: Oops: 0000 [#1]
> Jan 9 21:12:18 vosib6 kernel: SMP
> Jan 9 21:12:18 vosib6 kernel: Modules linked in: rds(U) rdma_ucm(U)
> rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoi\
> b(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_umad(U) ib_ucm(U)
> ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_c\
> ore(U) nfsd exportfs parport_pc lp parport autofs4 i2c_dev i2c_core
> nfs lockd nfs_acl sunrpc dm_mirror dm_m\
> ultipath dm_mod button battery ac uhci_hcd ehci_hcd hw_random shpchp
> md5 ipv6 e1000 floppy ata_piix libata \
> sg ext3 jbd aic79xx sd_mod scsi_mod
> Jan 9 21:12:18 vosib6 kernel: CPU: 3
> Jan 9 21:12:18 vosib6 kernel: EIP: 0060:[<fab60541>] Not
> tainted VLI
> Jan 9 21:12:18 vosib6 kernel: EFLAGS: 00010246
> (2.6.9-67.0.0.0.1.ELsmp)
> Jan 9 21:12:18 vosib6 kernel: EIP is at rds_ib_setup_qp+0x1d/0x219 [rds]
> Jan 9 21:12:18 vosib6 kernel: eax: 00000000 ebx: e1571efc ecx:
> f6219064 edx: 00000200
> Jan 9 21:12:18 vosib6 kernel: esi: f7394400 edi: f4b3ce00 ebp:
> e1571d8c esp: f04d3ec8
> Jan 9 21:12:18 vosib6 kernel: ds: 007b es: 007b ss: 0068
> Jan 9 21:12:18 vosib6 kernel: Process rdma_cm (pid: 22898,
> threadinfo=f04d3000 task=f65031b0)
> Jan 9 21:12:18 vosib6 kernel: Stack: 00000000 00000000 f04d3ef8
> 00000003 c3654760 c3654760 c3653d80 c36547\
> 60
> Jan 9 21:12:18 vosib6 kernel: f04d3f08 f7f20800 f7f205b0
> c3653d80 f6582080 f7f20720 c3831fbc c02d64\
> 6d
> Jan 9 21:12:18 vosib6 kernel: f04d3f68 e1571efc e1571d8c
> f4b3ce00 f4b3ce00 fab60888 2f085740 f0
>
> Olaf Kirch wrote:
>> Here we go - latest set of patches attached.
>>
>> New stuff:
>> - RDS extension headers. Rather than stuffing more and more
>> things into the header, I decided to reserve 16 bytes
>> for "extensions" and write the plumbing for it.
>> - RDMA extension header. Goes with every SEND following an
>> RDMA operation, and contains the R_Key.
>> This is used by the receiver to check for (and release)
>> MRs marked as use_once
>> - Changed GET_MR interface - got rid of phys_addr, and
>> added use_once.
>> The phys_addr stuff needs more cleanup
>> - Version extension header. We now broadcast our supported
>> RDS protocol version as part of the initial CONG_MAP update.
>> This doesn't do much right now, but will be needed to
>> do rolling updates in the future.
>> I added this stuff now, so that we don't have to rely
>> on advanced crystal balling later when the time comes
>> where we break the protocol.
>>
>> Vlad, you mentioned that it's possible to crash the RDS stack by
>> telnetting to the RDS TCP port. I was unable to reproduce this -
>> what did you do to trigger the crash?
>>
>> Olaf
>>
>
More information about the rds-devel
mailing list