[DTrace-devel] [PATCH 2/2] Implement the pcap action

Kris Van Hees kris.van.hees at oracle.com
Tue Oct 10 04:35:52 UTC 2023


On Mon, Oct 09, 2023 at 11:54:32PM -0400, Eugene Loh via DTrace-devel wrote:
> I am not familiar with pcap, but have some comments below.
> 
> First of all, should there be some err.* tests to check too few args, too
> many args, etc., as we have for some other actions? Also...

Good point.  I can definitely add those.

> On 9/23/23 15:14, Kris Van Hees via DTrace-devel wrote:
> > diff --git a/libdtrace/dt_cg.c b/libdtrace/dt_cg.c
> > @@ -1882,8 +1905,91 @@ dt_cg_act_panic(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind)
> >   static void
> >   dt_cg_act_pcap(dt_pcb_t *pcb, dt_node_t *dnp, dtrace_actkind_t kind)
> >   {
> > -	dnerror(dnp, D_UNKNOWN, "pcap() is not implemented (yet)\n");
> > -	/* FIXME: Needs implementation */
> > +	dtrace_hdl_t	*dtp = pcb->pcb_hdl;
> > +	dt_regset_t	*drp = pcb->pcb_regs;
> > +	dt_irlist_t	*dlp = &pcb->pcb_ir;
> > +	dt_node_t	*addr = dnp->dn_args;
> > +	dt_node_t	*proto = addr->dn_list;
> > +	uint_t		time_off, size_off, data_off, off;
> > +	uint_t		lbl_lenok = dt_irlist_label(dlp);
> > +	uint64_t	pcapsz = dtp->dt_options[DTRACEOPT_PCAPSIZE];
> > +	int		lenreg;
> > +
> > +	TRACE_REGSET("pcap(): Begin ");
> > +
> > +	/*
> > +	 * Create records for timestamp, size, protocol id, and packet data.
> > +	 * The protocol id is stored immediately from the passed in argument.
> > +	 */
> > +	time_off = dt_rec_add(dtp, dt_cg_fill_gap, DTRACEACT_PCAP,
> > +			      sizeof(uint64_t), sizeof(uint64_t), NULL, 0);
> > +	size_off = dt_rec_add(dtp, dt_cg_fill_gap, DTRACEACT_PCAP,
> > +			      sizeof(uint32_t), sizeof(uint32_t), NULL, 0);
> > +	dt_cg_store_val(pcb, proto, DTRACEACT_PCAP, NULL, 0);
> 
> 
> Might the evaluation of addr and/or proto have a side effect that impacts
> the other?  If so, we have to evaluate them in the correct order.  So, I
> would think we cannot evaluate proto yet. Presumably, the test suite could
> have tests that check arg evaluation order for various subroutines and
> actions.

The evaluation order of arguments to actions and subroutines is actually not
defined for D.

>From the docs: "The D compiler provides no guarantee regarding the order of
evaluation of arguments to a function or keys to an associative array. Be
careful of using expressions with interacting side-effects, such as the pair
of expressions i and i++, in these contexts."

> > +	data_off = dt_rec_add(dtp, dt_cg_fill_gap, DTRACEACT_PCAP,
> > +			      dtp->dt_options[DTRACEOPT_PCAPSIZE], 1, NULL, 0);
> > +
> > +	/* Store timestamp (ns since boot time). */
> > +	if (dt_regset_xalloc_args(drp) == -1)
> > +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > +	dt_regset_xalloc(drp, BPF_REG_0);
> > +	emit(dlp,  BPF_CALL_HELPER(BPF_FUNC_ktime_get_ns));
> > +	dt_regset_free_args(drp);
> > +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_9, time_off, BPF_REG_0));
> > +	dt_regset_free(drp, BPF_REG_0);
> > +
> > +	/*
> > +	 * Determine and store the packet data size.
> > +	 * The size of the data to be copied (and reported as size of the
> > +	 * packet capture) is the lesser of the data size (skb->len) and the
> > +	 * capture size.
> > +	 */
> > +	dt_cg_node(addr, dlp, drp);
> > +	off = dt_cg_ctf_offsetof("struct sk_buff", "len", NULL);
> > +
> > +	if (dt_regset_xalloc_args(drp) == -1)
> > +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > +	emit(dlp,  BPF_MOV_REG(BPF_REG_3, addr->dn_reg));
> > +	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, off));
> > +	emit(dlp,  BPF_MOV_IMM(BPF_REG_2, sizeof(uint32_t)));
> > +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_SP));
> > +	dt_regset_xalloc(drp, BPF_REG_0);
> > +	emit(dlp,  BPF_CALL_HELPER(dtp->dt_bpfhelper[BPF_FUNC_probe_read_kernel]));
> > +	dt_regset_free_args(drp);
> > +	dt_regset_free(drp, BPF_REG_0);
> > +
> > +	if ((lenreg = dt_regset_alloc(drp)) == -1)
> > +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > +
> > +	emit(dlp,  BPF_LOAD(BPF_DW, lenreg, BPF_REG_FP, DT_STK_SP));
> > +	emit(dlp,  BPF_LOAD(BPF_W, lenreg, lenreg, 0));
> 
> 
> That's "a lot" of instructions simply to read a word from an unverified
> address.  Not too many, but such a read is relatively common.  So, we should
> have a simple way of generating these instructions.  How about
> dt_cg_load_scalar()?  Bottom line is that we should look at the various
> sites that read words in this manner and put the associated code generation
> under a function that can be called from code like this.  Code reuse.

Yes, we do need to look into better (more) utility functions that help with
common code like this.  I am not sure whether we ought to do that right now,
but I'll look at it because it is becoming more common (here and also in
some provider trampolines).

 > +	emit(dlp,  BPF_BRANCH_IMM(BPF_JLE, lenreg, pcapsz, lbl_lenok));
> > +	emit(dlp,  BPF_MOV_IMM(lenreg, pcapsz));
> > +	emitl(dlp, lbl_lenok,
> > +		   BPF_STORE(BPF_W, BPF_REG_9, size_off, lenreg));
> > +
> > +	/* Copy the packet data to the output buffer. */
> > +	off = dt_cg_ctf_offsetof("struct sk_buff", "data", NULL);
> > +
> > +	if (dt_regset_xalloc_args(drp) == -1)
> > +		longjmp(yypcb->pcb_jmpbuf, EDT_NOREG);
> > +	emit(dlp,  BPF_MOV_REG(BPF_REG_3, addr->dn_reg));
> > +	dt_regset_free(drp, addr->dn_reg);
> > +	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, off));
> > +	emit(dlp,  BPF_MOV_IMM(BPF_REG_2, sizeof(uint64_t)));
> > +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_1, BPF_REG_FP, DT_STK_SP));
> > +	dt_regset_xalloc(drp, BPF_REG_0);
> > +	emit(dlp,  BPF_CALL_HELPER(dtp->dt_bpfhelper[BPF_FUNC_probe_read_kernel]));
> > +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_FP, DT_STK_SP));
> > +	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_3, 0));
> 
> 
> For example, here already is another instance of such a load. Here is
> already an opportunity for code reuse.

Indeed.

> > +	emit(dlp,  BPF_MOV_REG(BPF_REG_2, lenreg));
> 
> Should lenreg be freed now?  Otherwise, we have a register leak? Come to
> think of it, there could be a test to check for such reg leaks.  Like
> tst.pcap.file.sh but with the pcap action showing up twice, three times,
> four times, many times.  Different problems would be exposed.  E.g., ...

Yes, I forgot to free lenreg.  Thanks for catching that.

> What if lenreg were %r1, %r2, %r4, or %r5?  The BPF verifier would complain
> since the reg was just wiped clean in the last BPF helper function call. 
> The fix is to be disciplined about filling/spilling regs between BPF helper
> calls.
> 
> What if lenreg were %r3?  The BPF verifier would not complain in this
> case... but for a bad reason:  we just finished overwriting that reg.  So
> the verifier would be fine, but the value would be bad.  In this case, we
> have to fill %r2 first.  Then we can fill %r1 and %r3.
> 
> I know you've said that what we really need is robust reg management.  I
> suppose so, but we've had the current setup for a long time and so we should
> live with it for now.

Very valid point.

> > +	emit(dlp,  BPF_MOV_REG(BPF_REG_1, BPF_REG_9));
> > +	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, data_off));
> > +	emit(dlp,  BPF_CALL_HELPER(dtp->dt_bpfhelper[BPF_FUNC_probe_read_kernel]));
> > +	dt_regset_free_args(drp);
> > +	dt_regset_free(drp, BPF_REG_0);
> > +
> > +	TRACE_REGSET("pcap(): End   ");
> >   }
> > diff --git a/libdtrace/dt_consume.c b/libdtrace/dt_consume.c
> > @@ -1427,28 +1427,26 @@ dt_print_mod(dtrace_hdl_t *dtp, FILE *fp, const char *format, caddr_t addr)
> >   }
> >   int
> > -dt_print_pcap(dtrace_hdl_t *dtp, FILE *fp, dtrace_recdesc_t *rec,
> > +dt_print_pcap(dtrace_hdl_t *dtp, FILE *fp, dtrace_recdesc_t *rec, uint_t nrecs,
> >   	      const caddr_t buf)
> >   {
> > -	caddr_t	addr;
> > -	uint64_t time, proto, pktlen, maxlen;
> > -	const char *filename;
> > +	caddr_t			data;
> > +	uint64_t		time, proto, pktlen;
> > +	uint64_t		maxlen = dtp->dt_options[DTRACEOPT_PCAPSIZE];
> > +	const char		*filename;
> > -	addr = (caddr_t)buf + rec->dtrd_offset;
> > +	if (nrecs < 4)
> > +		return dt_set_errno(dtp, EDT_PCAP);
> >   	if (dt_read_scalar(buf, rec, &time) < 0 ||
> > -	    dt_read_scalar(buf + sizeof(uint64_t), rec, &pktlen) < 0)
> > +	    dt_read_scalar(buf, rec + 1, &pktlen) < 0)
> >   		return dt_set_errno(dtp, EDT_PCAP);
> > -	if (pktlen == 0) {
> > -		/*
> > -		 * skb must have been NULL, skip without capturing.
> > -		 */
> > +	/* If pktlen is 0, the skb must have been ULL, so don't capture it. */
> 
> 
> s/ULL/NULL/

Thanks.

> > +	if (pktlen == 0)
> >   		return 0;
> > -	}
> > -	maxlen = DT_PCAPSIZE(dtp->dt_options[DTRACEOPT_PCAPSIZE]);
> 
> 
> So we should remove the DT_PCAPSIZE definition in libdtrace/dt_pcap.h.

Indeed.

> > -	if (dt_read_scalar(buf, rec + 1, &proto) < 0)
> > +	if (dt_read_scalar(buf, rec + 2, &proto) < 0)
> >   		return dt_set_errno(dtp, EDT_PCAP);
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list