[rds-devel] [PATCH V2 net-next 5/7] rds: zerocopy Tx support.

Santosh Shilimkar santosh.shilimkar at oracle.com
Wed Feb 14 11:10:50 PST 2018


On 2/14/2018 2:28 AM, Sowmini Varadhan wrote:
> If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
> if the SO_ZEROCOPY socket option has been set on the PF_RDS socket,
> application pages sent down with rds_sendmsg() are pinned.
> 
> The pinning uses the accounting infrastructure added by
> Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages")
> 
> The payload bytes in the message may not be modified for the
> duration that the message has been pinned. A multi-threaded
> application using this infrastructure may thus need to be notified
> about send-completion so that it can free/reuse the buffers
> passed to rds_sendmsg(). Notification of send-completion will
> identify each message-buffer by a cookie that the application
> must specify as ancillary data to rds_sendmsg().
> The ancillary data in this case has cmsg_level == SOL_RDS
> and cmsg_type == RDS_CMSG_ZCOPY_COOKIE.
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan at oracle.com>
> ---
> v2:
>    - remove unused data_len argument to rds_rm_size;
>    - unmap as necessary if we fail in the middle of zerocopy setup
> 
>   include/uapi/linux/rds.h |    1 +
>   net/rds/message.c        |   51 +++++++++++++++++++++++++++++++++++++++++++++-
>   net/rds/rds.h            |    3 +-
>   net/rds/send.c           |   44 ++++++++++++++++++++++++++++++++++-----
>   4 files changed, 91 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
> index e71d449..12e3bca 100644
> --- a/include/uapi/linux/rds.h
> +++ b/include/uapi/linux/rds.h
> @@ -103,6 +103,7 @@
>   #define RDS_CMSG_MASKED_ATOMIC_FADD	8
>   #define RDS_CMSG_MASKED_ATOMIC_CSWP	9
>   #define RDS_CMSG_RXPATH_LATENCY		11
> +#define	RDS_CMSG_ZCOPY_COOKIE		12
>
s/RDS_CMSG_ZCOPY_COOKIE/RDS_CMSG_ZMSGCOPY_COOKIE	

>   #define RDS_INFO_FIRST			10000
>   #define RDS_INFO_COUNTERS		10000
> diff --git a/net/rds/message.c b/net/rds/message.c
> index d874b74..e499566 100644
> --- a/net/rds/message.c
> +++ b/net/rds/message.c
> @@ -341,12 +341,14 @@ struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned in
>   	return rm;
>   }
>   
> -int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from)
> +int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from,
> +			       bool zcopy)
>   {
>   	unsigned long to_copy, nbytes;
>   	unsigned long sg_off;
>   	struct scatterlist *sg;
>   	int ret = 0;
> +	int length = iov_iter_count(from);
>   
>   	rm->m_inc.i_hdr.h_len = cpu_to_be32(iov_iter_count(from));
>   
> @@ -356,6 +358,53 @@ int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from)
>   	sg = rm->data.op_sg;
>   	sg_off = 0; /* Dear gcc, sg->page will be null from kzalloc. */
>   
> +	if (zcopy) {
> +		int total_copied = 0;
> +		struct sk_buff *skb;
> +
> +		skb = alloc_skb(SO_EE_ORIGIN_MAX_ZCOOKIES * sizeof(u32),
> +				GFP_KERNEL);
This can sleep so you might want to check if you want to use ATOMIC 
version here.

> +		if (!skb)
> +			return -ENOMEM;
> +		rm->data.op_mmp_znotifier = RDS_ZCOPY_SKB(skb);
> +		memset(rm->data.op_mmp_znotifier, 0,
> +		       sizeof(*rm->data.op_mmp_znotifier));
> +		if (mm_account_pinned_pages(&rm->data.op_mmp_znotifier->z_mmp,
> +					    length)) {
> +			consume_skb(skb);
> +			rm->data.op_mmp_znotifier = NULL;
> +			return -ENOMEM;
> +		}
NOMEM new application visible change but probably the right one for this
particular case. Just need to make sure application can handle this
error.


> diff --git a/net/rds/rds.h b/net/rds/rds.h
> index 6e8fc4c..dfdc9b3 100644
> --- a/net/rds/rds.h
> +++ b/net/rds/rds.h
> @@ -784,7 +784,8 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
>   /* message.c */
>   struct rds_message *rds_message_alloc(unsigned int nents, gfp_t gfp);
>   struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents);
> -int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from);
> +int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from,
> +			       bool zcopy);
>   struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned int total_len);
>   void rds_message_populate_header(struct rds_header *hdr, __be16 sport,
>   				 __be16 dport, u64 seq);
> diff --git a/net/rds/send.c b/net/rds/send.c
> index 5ac0925..80171cf 100644
> --- a/net/rds/send.c
> +++ b/net/rds/send.c

[...]

> @@ -1087,8 +1112,15 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
>   		goto out;
>   	}
>   
> +	if (zcopy) {
> +		if (rs->rs_transport->t_type != RDS_TRANS_TCP) {
> +			ret = -EOPNOTSUPP;
> +			goto out;
> +		}
> +		num_sgs = iov_iter_npages(&msg->msg_iter, INT_MAX);
> +	}

Instead of this transport check, lets move this under transport function
which then can be populated by TCP transport.

Rest of the changes looks good.

Regards,
Santosh



More information about the rds-devel mailing list