[DTrace-devel] [PATCH] examples: include a first set of scripts and a README file

Eugene Loh eugene.loh at oracle.com
Fri Aug 29 03:10:54 UTC 2025


Reviewed-by: Eugene Loh <eugene.loh at oracle.com>
A kind of quick review, but they're good examples and do not need to be 
held up.

PS

1.  examples/thread-ids.d

I see code that basically looks like this:

     pid$target:libc.so:pthread_create:entry
     {
       self->pthr_id_p = (int64_t *) arg0;
     }
     pid$target:libc.so:pthread_create:return
     {
       this->pthr_id = *(int64_t *) copyin(*self->pthr_id_p,8);
     }

Under some circumstances -- like -p $pid and the script starts while a 
pthread_create() is running -- one might miss the entry but catch the 
return.  In that case, we get a NULL dereference of self->pthr_id_p.  
Not a big deal here.  In general, though, one wants to code around these 
possibilities.

2.  examples/sched-stats.d
    @thr_cpu_id_on_cpu[tid,curcpu->cpu_id] = count();
    @cpu_id_thr_on_cpu[curcpu->cpu_id,tid] = count();

    @thr_cpu_id_off_cpu[tid,curcpu->cpu_id] = count();
    @cpu_id_thr_off_cpu[curcpu->cpu_id,tid] = count();
I suspect the answer is "no" but could you have only one on-cpu 
aggregation and only one off-cpu aggregation, simply printing each one 
two ways (inverting the order of the keys)?  Probably not, sigh.

On 8/28/25 22:07, eugene.loh at oracle.com wrote:

> From: Ruud van der Pas <ruud.vanderpas at oracle.com>
>
> Add 7 example scripts and a README file.
>
> 2025-08-28  Ruud van der Pas  <ruud.vanderpas at oracle.com>
>
> 	* examples/fcalls.d: List and count function calls.
> 	* examples/io-cast-net.d: Show how to use wildcards.
> 	* examples/io-stats.d: Display I/O characteristics.
> 	* examples/sched-simple.d: Count on/off the CPU.
> 	* examples/sched-stats.d: Extensive scheduler statistics.
> 	* examples/thread-ids.d: Mapping of thread IDs.
> 	* examples/var-scope.d: Demonstrate scoping rules.
> 	* examples/README.md: Brief description of all examples.
>
> Signed-off-by: Ruud van der Pas <ruud.vanderpas at oracle.com>
> ---
>   examples/README.md      |  88 +++++++++++++
>   examples/fcalls.d       |  90 +++++++++++++
>   examples/io-cast-net.d  |  78 +++++++++++
>   examples/io-stats.d     | 280 ++++++++++++++++++++++++++++++++++++++++
>   examples/sched-simple.d |  82 ++++++++++++
>   examples/sched-stats.d  | 191 +++++++++++++++++++++++++++
>   examples/thread-ids.d   | 168 ++++++++++++++++++++++++
>   examples/var-scope.d    | 121 +++++++++++++++++
>   8 files changed, 1098 insertions(+)
>   create mode 100644 examples/README.md
>   create mode 100755 examples/fcalls.d
>   create mode 100755 examples/io-cast-net.d
>   create mode 100755 examples/io-stats.d
>   create mode 100755 examples/sched-simple.d
>   create mode 100755 examples/sched-stats.d
>   create mode 100755 examples/thread-ids.d
>   create mode 100755 examples/var-scope.d
>
> diff --git a/examples/README.md b/examples/README.md
> new file mode 100644
> index 00000000..de4ba090
> --- /dev/null
> +++ b/examples/README.md
> @@ -0,0 +1,88 @@
> +## Synopsis
> +
> +This is the README file for the examples directory that is included in the
> +DTrace distribution directory.  These examples are meant to be learned
> +from.  They can also be a starting point to explore changes and enhancements
> +you'd like to make to these scripts.
> +
> +## Target audience
> +
> +The scripts target a wide range of users.  There are examples that are
> +ideally suited for the beginners, but there also scripts that are
> +probably easier to understand once you have some more practical experiences.
> +To make it easier to select a script suitable for your level of experience,
> +we have added a level indicator, but please do not let that withold you from
> +looking at other examples.
> +
> +| Level | Description |
> +| :---: | :----       |
> +| B     | Beginner - Only some basic knowledge of DTrace is assumed. |
> +| M	| Medium  - Some experience with DTrace will be helpful.     |
> +| A     | Advanced - Practical experiences writing scripts is recommended. |
> +
> +## Installation
> +
> +There is no need for any additional installation instructions.  Use the
> +examples as provided, or make changes as you see fit.  You can do so
> +in-place, or get a copy in your working directory.
> +
> +## Usage
> +
> +Each script has a usage line that you can find under SYNOPSIS in the
> +top part of the script.  There are also extensive additional comments.
> +
> +In the comments, we often refer to a target application, or command.
> +With the latter we mean a Linux command.
> +Most scripts can be used for either, and also work on a running process.
> +Please check the comments though, because some scripts assume there is
> +a program called main, for example.
> +
> +The command given in the SYNOPSIS section may need to be modified
> +according to what needs to be traced:
> +
> +| Option        | Description |
> +| :---          | :----       |
> +| `-c <command>`| Trace the command (or application).  Tracing ends when the      |
> +|               | command/application terminates.                                 |
> +| `-p <pid>`	| Trace the process with id \<pid>.  Stop the tracing with ctrl-C. |
> +
> +An example of a script that can be used for both is `sched-simple.d`:
> +
> +```
> +$ sudo ./sched-simple.d -c ls
> +$ sudo ./sched-simple.d -p <pid-of-target>
> +ctrl-C
> +```
> +
> +## The examples
> +
> +These are the scripts included, plus a brief description.
> +
> +| Script          | Description                | Level |
> +| --------------- | -------------------------- | :---: |
> +| fcalls.d        | List the functions executed by the target application.  |B|
> +|                 | In addition to this, the number of calls to each function |
> +|                 | is printed.  This information is given on a per-thread    |
> +|                 | basis, as well as aggregated over all threads.            |
> +| io-cast-net.d   | For the target application, show a list of all the libc |B|
> +|                 | functions that include one of the selected keywords       |
> +|                 | (open,close,read,write) in their name.
> +| io-stats.d      | Show statistics for the fopen(), fread(), fwrite(),     |A|
> +|                 | pwrite() and read() functions.  There are several probes  |
> +|                 | and you are encouraged to experiment with this script.    |
> +| sched-simple.d  | Use the sched provider to count how often the target    |B|
> +|                 | application, or command, was scheduled on and off the     |
> +|                 | CPU.  The information is given for all individual threads,|
> +|                 | as well as across all the threads.                        |
> +| sched-stats.d   | Display various scheduler statistics for the target     |A|
> +|                 | application, or command.  For each thread, a histogram of |
> +|                 | the time on the CPU is shown.  Plus global statistics,    |
> +|                 | like the total time on the CPU, and the minimum and       |
> +|                 | maximum.  Those threads with an on-CPU time below a user  |
> +|                 | controllable limit are listed.  Also the CPU IDs and the  |
> +|                 | number of times a thread was on/off CPU are given.        |
> +| thread-ids.d    | For a Pthreads application, show the mapping between the|M|
> +|                 | thread IDs as created by the Pthreads system and the      |
> +|                 | thread ID as returned in the DTrace tid variable.         |
> +| var-scope.d     | Demonstrate some of the scoping rules for global and    |B|
> +|                 | clause-local variables.                                   |
> diff --git a/examples/fcalls.d b/examples/fcalls.d
> new file mode 100755
> index 00000000..f4973a89
> --- /dev/null
> +++ b/examples/fcalls.d
> @@ -0,0 +1,90 @@
> +#!/usr/sbin/dtrace -s
> +
> +/*
> + *  NAME
> + *    fcalls.d - list the functions executed by an application
> + *
> + *  SYNOPSIS
> + *    sudo ./fcalls.d -c "<name-of-app> [app options]"
> + *
> + *  DESCRIPTION
> + *    This script lists the functions executed by the target
> + *    application.  In addition to this, the number of calls to
> + *    each function is printed.  This information is given on a
> + *    per-thread basis, as well as aggregated over all threads.
> + *
> + *  NOTES
> + *    - Since a.out is used in the probe definitions, library calls
> + *    are excluded.  If a library like libc should be included,
> + *    duplicate the probe definitions and in the copied lines,
> + *    replace a.out by libc.so.
> + *    For example:
> + *      pid$target:a.out::entry,
> + *      pid$target:libc.so::entry
> + *        { <statements> }
> + *
> + *    - It is assumed that a function called main is executed.
> + *    If this is not the case, this is not a critical error.
> + *    The first probe is used to capture the name of the executable,
> + *    but this is not essential.  The probe and printf() statement
> + *    can safely be removed, or replaced by a suitable alternative.
> + */
> +
> +/*
> + *  Suppress the default output from the dtrace command and have
> + *  printa() print the aggregation data sorted by the first field.
> + */
> +#pragma D option quiet
> +#pragma D option aggsortkey=1
> +#pragma D option aggsortkeypos=0
> +
> +/*
> + *  Store the name of the target application.  The probe is restricted
> + *  to main only, because the exec name needs to be captured only once.
> + */
> +pid$target:a.out:main:entry
> +{
> +  executable_name = execname;
> +}
> +
> +/*
> + *  Use 4 aggregations to store the total number of function calls, the
> + *  counts per function and per thread, both seperately and differentiated
> + *  by thread and function.
> + */
> +pid$target:a.out::entry
> +{
> +  @total_call_counts                          = count();
> +  @call_counts_per_function[probefunc]        = count();
> +  @call_counts_per_thr[tid]                   = count();
> +  @counts_per_thr_and_function[tid,probefunc] = count();
> +}
> +
> +/*
> + *  Print the results.  Use format strings to create a table lay-out.
> + */
> +END {
> +  printf("===========================================\n");
> +  printf("      Function Call Count Statistics\n");
> +  printf("===========================================\n");
> +  printf("Name of the executable: %s\n" ,executable_name);
> +  printa("Total function calls  : %@d\n", at total_call_counts);
> +
> +  printf("\n===========================================\n");
> +  printf("      Aggregated Function Call Counts\n");
> +  printf("===========================================\n");
> +  printf("%-25s %12s\n\n","Function name","Count");
> +  printa("%-25s %@12d\n", at call_counts_per_function);
> +
> +  printf("\n===========================================\n");
> +  printf("      Function Call Counts Per Thread\n");
> +  printf("===========================================\n");
> +  printf("%-7s %12s\n\n", "TID","Count");
> +  printa("%-7d %@12d\n", at call_counts_per_thr);
> +
> +  printf("\n===========================================\n");
> +  printf("      Thread Level Function Call Counts\n");
> +  printf("===========================================\n");
> +  printf("%-7s  %-25s %8s\n\n","TID","Function name","Count");
> +  printa("%-7d  %-25s %@8d\n", at counts_per_thr_and_function);
> +}
> diff --git a/examples/io-cast-net.d b/examples/io-cast-net.d
> new file mode 100755
> index 00000000..9a933b78
> --- /dev/null
> +++ b/examples/io-cast-net.d
> @@ -0,0 +1,78 @@
> +#!/usr/sbin/dtrace -s
> +
> +/*
> + *  NAME
> + *    io-cast-net.d - get an overview of read and write calls
> + *
> + *  SYNOPSIS
> + *    sudo ./io-cast-net.d -c "<name-of-app> [app options]"
> + *
> + *  DESCRIPTION
> + *    This script lists all the functions in libc that include
> + *    one of the following keywords in their name:
> + *      read, write, open, close
> + *    This is a fairly wide net that is cast to find out which
> + *    libc functions are used to open, or close a file, and which
> + *    functions from libc are used to read from, or write to a
> + *    file.
> + *    The number of calls to each of these functions is counted,
> + *    plus the total of all such calls.
> + *
> + *  NOTES
> + *    - This is an example how you can find out more about a certain
> + *    area of interest.  Wildcards and empty fields in the probe
> + *    definition are ideally suited for this, but be aware not to
> + *    overask and create a huge number of probes.
> + *
> + *    - This script could be the first step to analyze the I/O
> + *    behaviour of an application, or command.
> + *
> + *    First, this script, io-cast-net.d, may be used on an application
> + *    to identify all functions with read, write, open, and close
> + *    in the name.  This list may then be examined to manually select
> + *    the particular functions to trace in this application.
> + *    An example of this approach is in script io-stats.d.
> + *
> + *    - Wildcards are used and as a result, more functions may be listed
> + *    than you might be interested in, but in this way you will see all
> + *    of them and can make a selection which one(s) to focus on.
> + *
> + *    - There are no print statements in this example, because we would
> + *    like to show the default output from DTrace.  In particular, the
> + *    feature that, when the tracing has finished, all aggregations are
> + *    printed by default.  Since this example only uses aggregations,
> + *    it means that all results are printed upon completion.
> + */
> +
> +/*
> + *  These are the probes used to query the system and see those function
> + *  calls.  The first aggregation has a key that consists of 2 fields:
> + *  the name of the function and the module name.  Thanks to this key,
> + *  the results are differentiated by the function name and the name
> + *  of the module.
> + *
> + *  The latter is always going to be libc.so, e.g. libc.so.6.  We
> + *  have included it in the key for the aggregation to show how
> + *  to make a script more flexible.  In case the module name is
> + *  left out in the probe definition, or changed, the script will
> + *  still work and print the module name(s) of the probe(s) that
> + *  fired.
> + *
> + *  The second aggregation has no key.  That means that the count is
> + *  incremented each time one of the probes fires.  That will give us
> + *  the total count across all the probes that fired.
> + *
> + *  The output shows a table with 3 columns: the name of the function,
> + *  the module (which should be libc), and the call count.  The results
> + *  are sorted by the count.  This table is followed by a single number.
> + *  It is the total of number of calls that have been traced.
> + */
> +
> +pid$target:libc.so:*read*:entry,
> +pid$target:libc.so:*write*:entry,
> +pid$target:libc.so:*open*:entry,
> +pid$target:libc.so:*close*:entry
> +{
> +  @target_calls[probefunc,probemod] = count();
> +  @total_count                      = count();
> +}
> diff --git a/examples/io-stats.d b/examples/io-stats.d
> new file mode 100755
> index 00000000..30e1e64a
> --- /dev/null
> +++ b/examples/io-stats.d
> @@ -0,0 +1,280 @@
> +#!/usr/sbin/dtrace -s
> +
> +/*
> + *  NAME
> + *    io-stats.d - show several I/O related statistics
> + *
> + *  SYNOPSIS
> + *    sudo ./io-stats.d -c "<name-of-app> [app options]"
> + *
> + *  DESCRIPTION
> + *    This script shows several I/O statistics for the target
> + *    application, or command.  Among others, the filename, the
> + *    file descriptor, or stream it is connected to, the number
> + *    of bytes read, written, or both, are printed.
> + *
> + *  NOTES
> + *    - All the probes but the END probe in this script are based
> + *      upon the pid provider.
> + *
> + *    - This script is quite elaborate and has several probes.
> + *    More specifically, in alphabetical order, these are the
> + *    functions from the libc library that are traced:
> + *     close()
> + *     fclose()
> + *     fopen()
> + *     fread()
> + *     fwrite()
> + *     open()
> + *     pwrite()
> + *     read()
> + *    In the probes, we rely on the information from the man pages
> + *    for these functions to identify arguments of interest and
> + *    the return values.
> + *
> + *    - Not all of the probes may be relevant to your case.  While
> + *    no harm is done if a probe does not fire, the associated
> + *    aggregation(s) will be empty, but still printed.  This is because
> + *    we use printa() to print the aggregation(s).  If printa() is not
> + *    used for an aggregation, nothing is printed if it is empty.  You
> + *    can of course always remove such irrelevant probes and related
> + *    printa() and printf()  statements.
> + *
> + *    - A related script is io-cast-net.d.  You may want to run this
> + *    first to cast a fairly wide net and explore which open, close,
> + *    read and write functions are called when executing the target.
> + *
> + *    - All print statements are in the END probe.  On purpose
> + *    several different format strings are used.  This is done to
> + *    demonstrate the flexibility in presenting the results.
> + *
> + *    Note that we do not need to include these print statements,
> + *    because aggregations that are not explictly printed, are
> + *    automatically printed when the script terminates.  The reason
> + *    we print them ourselves is to have control over the lay-out.
> + *    Another thing we do is to print more than one aggregation
> + *    with a single printa() statement.
> + */
> +
> +/*
> + *  Suppress the default output from the dtrace command and have
> + *  printa() print the aggregation data sorted by the first field.
> + */
> +#pragma D option quiet
> +#pragma D option aggsortkey=1
> +#pragma D option aggsortkeypos=0
> +
> +/*
> + *  Capture the name of the file that needs to be opened, in the call
> + *  to any function that starts with fopen.  The first argument to
> + *  this function contains the name of the file to be opened.
> + *  As arg0 is of type integer, it is converted to a string, copied
> + *  from user space and stored in the thread-local variable fname_fopen.
> + */
> +pid$target:libc.so:fopen*:entry
> +{
> +  self->fname_fopen = copyinstr(arg0);
> +}
> +
> +/*
> + *  This probe is nearly identical to the one above, but note that
> + *  for this probe, the name of the thread-local variable is slightly
> + *  different and called fname_open.
> + */
> +pid$target:libc.so:open:entry
> +{
> +  self->fname_open = copyinstr(arg0);
> +}
> +
> +/*
> + *  This is the return probe for any of the fopen calls.  The predicate
> + *  checks that fname_fopen has been set and if so, in the clause the
> + *  count is updated and the new value is stored in an aggregation
> + *  with 3 fields: the name of the function, the name of the
> + *  file and the return value of the function, which is stored
> + *  in arg1.
> + *  Note that we can indeed access this file name.  Although set in the
> + *  corresponding entry probe, a thread-local variable is accessible in
> + *  the data space of the corresponding thread and so can be read, or
> + *  written, here.
> + *  The third field in the aggregation is arg1.  This is the return value
> + *  of the function and is the pointer to the stream.
> + *  Since variable fname_fopen is no longer needed, the storage is freed.
> + */
> +pid$target:libc.so:fopen*:return
> +/ self->fname_fopen != 0/
> +{
> +  @file_pointer[probefunc,self->fname_fopen,arg1] = count();
> +  self->fname_fopen = 0;
> +}
> +
> +/*
> + *  This probe is nearly identical to the previous probe.  Other than
> + *  the name of the thread-local variable, the difference is in the
> + *  name of the aggregation.  This is because the return value, the
> + *  file descriptor, is of type integer.  This means that we need
> + *  a different format string when printing this field.
> + */
> +pid$target:libc.so:open:return
> +/ self->fname_open != 0/
> +{
> +  @file_descriptor[probefunc,self->fname_open,arg1] = count();
> +  self->fname_open = 0;
> +}
> +
> +/*
> + *  This probe stores the name of the function, which is fclose(),
> + *  and the argument, the pointer to the stream.
> + */
> +pid$target:libc.so:fclose:entry
> +{
> +  @fclose_stream[probefunc,arg0] = count();
> +}
> +
> +/*
> + *  This probe is very similar to the one above.  Again, the difference
> + *  is in the name of the aggregation, and the argument which is the
> + *  file descriptor.  It is of type integer.  We need to know this when
> + *  printing the key.
> + */
> +pid$target:libc.so:close:entry
> +{
> +  @close_fd[probefunc,arg0] = count();
> +}
> +
> +/*
> + *  One of the functions traced, is pwrite().  The first argument, arg0,
> + *  is the file descriptor.  The third argument is arg2.  This contains
> + *  the number of bytes requested to be written.
> + *  The aggregation function sum() is used to sum up all those byte counts.
> + *  As a result, the aggregation contains the total number of bytes
> + *  requested to be written to the file connected to the file descriptor.
> + */
> +pid$target:libc.so:pwrite:entry
> +{
> +  @pwrite_bytes[probefunc,arg0] = sum(arg2);
> +}
> +
> +/*
> + *  The pwrite() function returns the number of bytes actually written.
> + *  This value is added to the existing contents of the aggregation.  The
> + *  name of the function, pwrite, is the only field in the key, so this
> + *  aggregation contains the total number of bytes written by pwrite().
> + */
> +pid$target:libc.so:pwrite:return
> +{
> +  @pwrite_total_bytes[probefunc] = sum(arg1);
> +}
> +
> +/*
> + *  This probe accumulates the number of bytes the read() function requests
> + *  to be read.  The aggregation is differentiated by the file descriptor.
> + *  This means that we get the totals on a per file descriptor basis.
> + *  We also store the file descriptor in a thread-local variable called fd.
> + *  A thread-lcoal variable is used because we would like to reference it in
> + *  the return probe for this function.
> + */
> +pid$target:libc.so:read:entry
> +{
> +  self->fd = arg0;
> +  @bytes_requested[probefunc,self->fd] = sum(arg2);
> +}
> +
> +/*
> + *  The read() function returns the actual number of bytes read.  These
> + *  values are accumulated in aggregation bytes_actually_read that includes
> + *  the file descriptor in the key.
> + *  We also accumulate the total number of bytes actually read across all
> + *  the file descriptors and store this in aggregation total_bytes_read.
> + *  Note that this aggregation has no key.
> + *  Since thread-local variable fd is no longer needed, the storage is
> + *  freed.
> + */
> +pid$target:libc.so:read:return
> +{
> +  @bytes_actually_read[probefunc,self->fd] = sum(arg1);
> +  @total_bytes_read                        = sum(arg1);
> +  self->fd = 0;
> +}
> +
> +/*
> + *  This is a shared clause for the probes for the fread() and fwrite()
> + *  functions.  We can share the clause, because the arguments and return
> + *  value are the same for both functions.
> + */
> +pid$target:libc.so:fread:entry,
> +pid$target:libc.so:fwrite:entry
> +{
> +/*
> + *  This probe traces the entry to functions fread() and fwrite().
> + *  A clause-local variable called comment is used to store a dynamically
> + *  generated string.  This string is then concatenated with the name of the
> + *  function, resulting in a string, stored in clause-local variable rw,
> + *  that depends on the function name in the probe.
> + */
> +  this->comment = (probefunc == "fread") ?
> +                   "read by " : "written by ";
> +  this->rw = strjoin(this->comment,probefunc);
> +
> +/*
> + *  The aggregation has a key that consists of the pointer to the stream and
> + *  the generated string.  This aggregation accumulates the product of the
> + *  size in bytes and the number of elements, which is the total number of
> + *  bytes read, or written.
> + */
> +  @total_byte_count_frw[arg3,this->rw] = sum(arg1*arg2);
> +}
> +
> +/*
> + *  This probe accumulates the return value of the fread() function, which
> + *  is the number of elements read.  This number is accumulated across all
> + *  pointers to streams.
> + */
> +pid$target:libc.so:fread:return
> +{
> +  @fread_total_elements[probefunc] = sum(arg1);
> +}
> +
> +/*
> + *  The output section where all the results are printed.  In one case,
> + *  we use 2 aggregations in the call to printa().  This is supported if
> + *  the key is the same.
> + *
> + *  Note that we do not need to include these print statements, because
> + *  aggregations that are not explictly printed, are automatically printed
> + *  when the script terminates.  The reason we print them ourselves is
> + *  to have control over the lay-out.
> + *  Another thing we do is to print more than one aggregation with a
> + *  single printa() statement.
> + */
> +END
> +{
> +  printf("%8s %20s   %-15s %6s\n",
> +               "Function","Filename","File pointer","Count");
> +  printa("%8s %20s   0x%-13p %@6d\n", at file_pointer);
> +
> +  printf("\n%8s %20s   %-15s %6s\n",
> +               "Function","Filename","File descriptor","Count");
> +  printa("%8s %20s   %-15d %@6d\n", at file_descriptor);
> +
> +  printf("\n%8s   %-16s %7s\n","Function","Stream/FD","Count");
> +  printa("%8s   0x%-14p %@7d\n", at fclose_stream);
> +  printa("%8s   %-16d %@7d\n", at close_fd);
> +
> +  printf("\n");
> +  printf("%8s %5s %14s\n","Function","FD","Bytes written");
> +  printa("%8s %5d %@14d\n", at pwrite_bytes);
> +  printa("Total bytes written by %s: %@ld\n", at pwrite_total_bytes);
> +
> +  printa("\nOn stream %p - Total bytes %s = %@ld\n",
> +                                  @total_byte_count_frw);
> +
> +  printa("Total elements read by %s = %@ld\n",
> +                                  @fread_total_elements);
> +
> +  printf("\n%33s\n","Bytes read");
> +  printf("%8s %6s %10s %10s\n","Function","FD","Requested","Actual");
> +  printa("%8s %6d %@10d %@10d\n", at bytes_requested,
> +                                 @bytes_actually_read);
> +  printa("\nTotal number of bytes read: %@d\n", at total_bytes_read);
> +}
> diff --git a/examples/sched-simple.d b/examples/sched-simple.d
> new file mode 100755
> index 00000000..d653276c
> --- /dev/null
> +++ b/examples/sched-simple.d
> @@ -0,0 +1,82 @@
> +#!/usr/sbin/dtrace -s
> +/*
> + *  NAME
> + *    sched-simple.d - count how often the target was on/off the CPU
> + *
> + *  SYNOPSIS
> + *    sudo ./sched-simple.d -c <target>
> + *
> + *  DESCRIPTION
> + *    This script uses the sched provider to count how often the
> + *    target proces was scheduled on and off the CPU.  The target
> + *    can be an application, or any command.
> + *    The information is given for all individual threads used by
> + *    the target, as well as across all the threads.
> + *
> + *  NOTES
> + *    There are 2 aggregations.  The one with 2 fields in the
> + *    key, thr_count_states, uses the thread ID as the first field.
> + *    By configuring printa() through pragmas to print the data
> + *    sorted by the first field, the results for this aggregation
> + *    are shown on a per thread basis.
> + */
> +
> +/*
> + *  Suppress the default output from the dtrace command and have
> + *  printa() print the aggregation data sorted by the first field.
> + */
> +#pragma D option quiet
> +#pragma D option aggsortkey=1
> +#pragma D option aggsortkeypos=0
> +
> +/*
> + *  These are 2 probes from the sched provider that share a clause.
> + *  Without a predicate, these probes will fire as soon as any
> + *  process goes on/off the CPU.  That will work fine, but in this
> + *  case we use a predicate to restrict the probing to the target
> + *  application, or command.
> + *
> + */
> +sched:::on-cpu,
> +sched:::off-cpu
> +/ pid == $target /
> +{
> +/*
> + *  We know that the probe name is either on-cpu or off-cpu.  This
> + *  is used to assign a different string to clause-local variable
> + *  called state.  This variable is used in the key for the aggregations
> + *  below.
> + */
> +  this->state = (probename == "on-cpu") ?
> +                  "scheduled on" : "taken off";
> +
> +/*
> + *  The 2 aggregations below are used to count how often the probes fired.
> + *  The difference is in the key.  The first aggregation, count_states,
> + *  does not use the thread ID in the key.  It counts across all
> + *  threads.  In contrast with this, the second aggregation,
> + *  thr_count_states, differentiates the counts by the thread ID.
> + */
> +  @count_states[this->state]         = count();
> +  @thr_count_states[tid,this->state] = count();
> +}
> +
> +/*
> + *  Print the results.  We use printf() to print a header.  For both
> + *  aggregations, a formatted printa() statement is used.  The
> + *  printf() statement in between the 2 printa() statements is used
> + *  to separate the 2 blocks with results.
> + *
> + *  Note that we do not need to include these print statements,
> + *  because aggregations that are not explictly printed, are
> + *  automatically printed when the script terminates.  The reason
> + *  we print them ourselves is to have control over the lay-out.
> + */
> +END
> +{
> +  printf("%8s %-20s %6s\n","TID","State","Count");
> +  printf("====================================\n");
> +  printa("%8d %-12s the CPU %@6d\n", at thr_count_states);
> +  printf("\n");
> +  printa("Total count %12s: %@d\n", at count_states);
> +}
> diff --git a/examples/sched-stats.d b/examples/sched-stats.d
> new file mode 100755
> index 00000000..c932b010
> --- /dev/null
> +++ b/examples/sched-stats.d
> @@ -0,0 +1,191 @@
> +#!/usr/sbin/dtrace -s
> +
> +/*
> + *  NAME
> + *    sched-stats.d - display various scheduler statistics
> + *
> + *  SYNOPSIS
> + *    sudo ./sched-stats.d -c "<name-of-app> [app options]" <threshold>
> + *
> + *  DESCRIPTION
> + *    This script shows several thread-scheduling statistics that
> + *    were collected while the target application, or command, was
> + *    running.
> + *
> + *    For good performance, a thread should run on the same CPU,
> + *    or at least in the same NUMA node for as long as possible.
> + *    In case it is taken off the CPU, it should be scheduled
> + *    back on the same CPU, or at least in the same NUMA node.
> + *
> + *    The information printed, shows the scheduling statistics
> + *    for any (multithreaded) application:
> + *
> + *    - For each thread, a histogram of the on-cpu times is shown.
> + *
> + *    - This is followed by a table with the overall timing statistics.
> + *    For each thread, the total on-cpu time is given, plus the
> + *    minimum and maximum on-cpu time.
> + *
> + *    - The script takes one parameter.  This is a threshold value.
> + *    It is a lower limit (in microseconds) on the on-cpu time.
> + *    For each thread it is reported how often the on-cpu time was
> + *    at, or below, this value.  This table might point at threads
> + *    that are falling behind on a (heavily loaded) system.
> + *
> + *    - The last 2 tables give information on the thread affinity.
> + *    The first table of these 2 is sorted by thread ID and shows the
> + *    CPU(s) used by each thread, as well how often the thread was
> + *    scheduled on the CPU, and taken off again.
> + *    The second table of these 2 shows the same information, but now
> +*     from a CPU centric view.
> + *
> + *  NOTES
> + *    This script works with any target application or command.
> + */
> +
> +/*
> + *  Suppress the default output from the dtrace command and have
> + *  printa() print the aggregation data sorted by the first field.
> + */
> +
> +#pragma D option quiet
> +#pragma D option aggsortkey=1
> +#pragma D option aggsortkeypos=0
> +
> +/*
> + *  Place the argument from the command line in a global variable
> + *  for better readability.
> + */
> +BEGIN
> +{
> +  time_threshold = $1;
> +}
> +
> +/*
> + *  An on-cpu probe with a predicate to ensure that the probe only
> + *  fires when the target is scheduled on the CPU.
> + */
> +sched:::on-cpu
> +/ pid == $target /
> +{
> +/*
> + *  Set the start value of the timer.  Since each thread needs
> + *  to have its own timer, we use a thread-local variable.
> + */
> +  self->ts_start = timestamp;
> +
> +/*
> + *  There are 2 aggregations.  Both increment their counter each
> + *  time that this probe fires.  The thread ID (in tid) and CPU ID
> + *  (in curcpu->cpu_id) are used to define the key.
> + *
> + *  The difference between the 2 aggregations is that the fields in
> + *  the key have been swapped.  Given that printa() has been configured
> + *  to sort the data by the first column, the first aggregation shows a
> + *  thread centric view, while the second one has a CPU ID centric view.
> + */
> +  @thr_cpu_id_on_cpu[tid,curcpu->cpu_id] = count();
> +  @cpu_id_thr_on_cpu[curcpu->cpu_id,tid] = count();
> +}
> +
> +/*
> + *  This is the first of 2 probes using off-cpu.  Again a predicate is
> + *  used to restrict the data to the target command.
> + */
> +sched:::off-cpu
> +/ pid == $target /
> +{
> +/*
> + *  The process/thread is about to go off CPU.  Now is the time to
> + *  record how long the thread has been running on the CPU.
> + *  The value is converted from nanoseconds to microseconds (us) and
> + *  stored in a clause-local variable.
> + */
> +  this->time_on_cpu = (timestamp - self->ts_start)/1000;
> +
> +/*
> + *  This variable is used to update 4 aggregations: the histogram of
> + *  the on-CPU times, the total on CPU time, plus the minimum and
> + *  maximum time on the CPU.
> + */
> +  @timings[tid]    = quantize(this->time_on_cpu);
> +  @total_time[tid] = sum(this->time_on_cpu);
> +  @min_time[tid]   = min(this->time_on_cpu);
> +  @max_time[tid]   = max(this->time_on_cpu);
> +
> +/*
> + *  These 2 aggregations are very similar to those used in the
> + *  on-cpu probe.
> + */
> +  @thr_cpu_id_off_cpu[tid,curcpu->cpu_id] = count();
> +  @cpu_id_thr_off_cpu[curcpu->cpu_id,tid] = count();
> +
> +/*
> + *  Free the storage for the thread-local variable since it is no
> + *  longer needed.
> + */
> +  self->ts_start = 0;
> +}
> +
> +/*
> + *  This is the second off-cpu probe.  It has a different and somewhat
> + *  more complicated predicate.
> + *  The clause is only executed for the target command and if the time
> + *  on the CPU is below the user controllable threshold value.
> + *  We basically record and count the event, but of course any other
> + *  action can be taken here.
> + *  An aggregation is used to store the data.  The thread ID (tid)
> + *  is used as a key and we increment the counter.  This produces
> + *  a count for each thread.
> + */
> +sched:::off-cpu
> +/ pid == $target && this->time_on_cpu <= time_threshold /
> +{
> +  @thr_below[tid] = count();
> +}
> +
> +/*
> + *  This is where all results are printed.  It is all quite
> + *  straightforward, but note that in three cases, multiple
> + *  aggregations are printed with a single printa().  As long as the
> + *  keys are the same, this is supported and makes it possible to
> + *  print multiple columns with values.
> + *
> + *  Note that we do not need to include these print statements,
> + *  because aggregations that are not explictly printed, are
> + *  automatically printed when the script terminates.  The reason
> + *  we print them ourselves is to have control over the lay-out.
> + *  Another thing we do is to print more than one aggregation
> + *  with a single printa() statement.
> + */
> +END
> +{
> +  printf("Thread times on the CPU (us)\n");
> +  printa(@timings);
> +
> +  printf("\n=====================================\n");
> +  printf("%10s %26s\n","TID", "Times on the CPU (us)");
> +  printf("%21s %6s %8s\n","Total","Min","Max");
> +  printf("=====================================\n");
> +  printa("%10d %@10d %@6d %@8d\n", at total_time, at min_time, at max_time);
> +
> +  printf("\n=====================================\n");
> +  printf("Thread ON CPU time threshold: %d us\n",time_threshold);
> +  printf("=====================================\n");
> +  printf("%8s %26s\n","TID","Count below threshold");
> +  printa("%8d %@14d\n", at thr_below);
> +
> +  printf("\n=====================================\n");
> +  printf("%10s  %8s %11s\n","TID","CPU ID","Count");
> +  printf("%37s\n","On CPU  Off CPU");
> +  printf("=====================================\n");
> +  printa("%10d  %8d %@7d %@8d\n", at thr_cpu_id_on_cpu,
> +                                 @thr_cpu_id_off_cpu);
> +
> +  printf("\n=====================================\n");
> +  printf("%8s  %10s %11s\n","CPU ID","TID","Count");
> +  printf("%37s\n","On CPU  Off CPU");
> +  printf("=====================================\n");
> +  printa("%8d  %10d %@7d %@8d\n", at cpu_id_thr_on_cpu,
> +                                 @cpu_id_thr_off_cpu);
> +}
> diff --git a/examples/thread-ids.d b/examples/thread-ids.d
> new file mode 100755
> index 00000000..eafde492
> --- /dev/null
> +++ b/examples/thread-ids.d
> @@ -0,0 +1,168 @@
> +#!/usr/sbin/dtrace -s
> +
> +/*
> + *  NAME
> + *    thread-ids.d - show the mapping between Pthread IDs and tid values
> + *
> + *  SYNOPSIS
> + *    sudo ./thread-ids.d -c "<name-of-app> [app options]"
> + *
> + *  DESCRIPTION
> + *    This script assumes that the target uses the Pthreads library
> + *    to create one or more threads.  It shows the mapping of the
> + *    Pthread thread IDs and the thread ID, as returned in the tid
> + *    built-in variable.
> + *
> + *  NOTES
> + *    - In addition to showing how to uncover this mapping, this
> + *    script also shows a technique how to retrieve a value from
> + *    a pointer argument in a function call.
> + *
> + *    In this case, this is the thread ID that pthread_create()
> + *    returns in its first argument.
> + *
> + *    This is from the man page for pthread_create():
> + *
> + *    int pthread_create(pthread_t *restrict thread,
> + *                        const pthread_attr_t *restrict attr,
> + *                        void *(*start_routine)(void *),
> + *                        void *restrict arg);
> + *
> + *    We need to capture the contents of *thread.
> + *
> + *    In this case, we cannot use the built-in tid variable
> + *    within pthread_create(), because typically, this function
> + *    is executed by one thread, the main thread.  This means
> + *    that the value in tid is the thread ID of this main thread,
> + *    but we need to have the value of the thread that is created
> + *    as a result of calling pthread_create().  As shown below,
> + *    this can be done by tracing clone3(), which is called by
> + *    pthread_create().
> +
> + *    - It is assumed that a function called main is executed.
> + *    If this is not the case, this is not a critical error.
> + *    The first probe is used to capture the tid value for the main
> + *    program.  This thread is however not created by function
> + *    pthread_create() and therefore this part of the script is
> + *    not essential.
> + *    This is why this probe and corresponding printf() statement
> + *    in the END probe can safely be removed, or replaced by a
> + *    suitable alternative.
> + */
> +
> +/*
> + *  Suppress the default output from the dtrace command and have
> + *  printa() print the aggregation data sorted by the first field.
> + */
> +#pragma D option quiet
> +#pragma D option aggsortkey=1
> +#pragma D option aggsortkeypos=0
> +
> +/*
> + *  Declare a thread-local variable.  This is to ensure that the
> + *  compiler sees it before it is referenced.
> + */
> +self int clone_tid;
> +
> +/*
> + *  Store the thread ID of the main thread.
> + */
> +pid$target:a.out:main:entry
> +{
> +  tid_main = tid;
> +}
> +
> +/*
> + *  Variable pthr_id_p captures the first argument of the call to
> + *  pthread_create().  This is a pointer, but we can't dereference
> + *  it here, because the contents this pointer points to, are only
> + *  available upon return.
> + *  This is solved by storing the pointer here.  In the return probe
> + *  for this function, we can then dereference the pointer.
> + */
> +pid$target:libc.so:pthread_create:entry
> +{
> +  self->pthr_id_p = (int64_t *) arg0;
> +}
> +
> +/*
> + *  We actually know that clone3() is called.  By using the wildcard
> + *  here, the script continues to work in case this number changes in
> + *  the future.
> + */
> +pid$target:libc.so:clone*:return
> +/ self->pthr_id_p !=NULL /
> +{
> +/*
> + *  We know that one of the clone functions is called from within
> + *  pthread_create() and it returns the thread ID that DTrace stores
> + *  in the tid variable.
> + *  This is why the return value, which is stored in arg1, is copied
> + *  into a thread-local variable called clone_tid.  This variable is
> + *  then referenced in the return probe for pthread_create().
> + */
> +  self->clone_tid = arg1;
> +}
> +
> +/*
> + *  This is where things come together.
> + *
> + *  We already have the value for the thread ID as used by DTrace.
> + *  It is stored in thread-local variable clone_tid.
> + *
> + *  Now we can capture the Pthreads thread ID.
> + *
> + *  There is one more thing to this though.  Below we use an
> + *  aggregation to store the results (and ignore the count when
> + *  printing the results), but this is not strictly necessary.
> + *
> + *  The approach chosen here allows us to control the sorting of
> + *  the results.  In this case the sort field has been set to the
> + *  Pthreads thread ID, but this can easily be changed.
> + *
> + *  If there is no need to print the data sorted, a simple printf()
> + *  will do.  All that needs to be done then is to print the two
> + *  variables this->pthr_id and self->clone_tid.
> + *
> + */
> +pid$target:libc.so:pthread_create:return
> +{
> +/*
> + *  We are about to return from pthread_create() and can dereference
> + *  the pointer.
> + *  Before we do so, the data needs to be copied from user space into
> + *  the kernel.  Since this is a 64 bit address, 8 bytes are copied.
> + *  The value is what pthread_create() returns in its *thread first
> + *  argument, which is the Pthreads thread ID.
> + *  This gives us both thread IDs and they are used in the key for the
> + *  aggregation called thread_mapping.
> + */
> +  this->pthr_id = *(int64_t *) copyin(*self->pthr_id_p,8);
> +  @thread_mapping[this->pthr_id,self->clone_tid] = count();
> +
> +/*
> + *  Free the storage for the thread-local variables.
> + */
> +  self->pthr_id_p      = 0;
> +  self->clone_tid      = 0;
> +}
> +
> +/*
> + *  The aggregation is printed in the END probe.  We use printf()
> + *  statements to print the thread ID of the main program and the
> + *  table header.
> + *  Note that there is no format field for the value for the
> + *  aggregation.  As explained above, the value is not relevant
> + *  in this case.
> + *
> + *  Note that we do not need to include these print statements,
> + *  because aggregations that are not explictly printed, are
> + *  automatically printed when the script terminates.  The reason
> + *  we print them ourselves is to have control over the lay-out.
> + */
> +END
> +{
> +  printf("Thread ID of main program: %d\n\n",tid_main);
> +  printf("%16s <=> %-9s\n\n","Pthreads ID","Thread ID");
> +  printa("%16d <=> %-9d\n", at thread_mapping);
> +}
> diff --git a/examples/var-scope.d b/examples/var-scope.d
> new file mode 100755
> index 00000000..8a99bcc3
> --- /dev/null
> +++ b/examples/var-scope.d
> @@ -0,0 +1,121 @@
> +#!/usr/sbin/dtrace -s
> +
> +/*
> + *  NAME
> + *    var-scope.d - initialize, update and print variables
> + *
> + *  SYNOPSIS
> + *    sudo ./var-scope.d -c <target>
> + *
> + *  DESCRIPTION
> + *    This script demonstrates some of the scoping rules.  It
> + *    shows the behaviour of global and clause-local variables.
> + *    The target can be an application, or any command.
> + *
> + *  NOTES
> + *    - Other than the BEGIN and END probe used here, there are
> + *    2 more probes for the fopen() function, but you can use any
> + *    probe you like, or no additional probes at all.  You are
> + *    encouraged to experiment with this example and make changes.
> + *
> + *    - In case this example is used with the date command, you will
> + *    see something like this:
> + *
> + *    $ sudo ./var-scope.d -c date
> + *    In BEGIN
> + *        A = 10 B = 20
> + *        this->C = 60
> + *    In fopen - return
> + *        A = 11 B = 21
> + *    <output from the date command>
> + *    In END
> + *        A = 11 B = 21
> + *    $
> +
> + *    - Thread-local variables are not demonstrated here, but many
> + *    of the other examples, like io-stats.d, make extensive use
> + *    of them.  Please check one of those scripts if you would like
> + *    to learn more how to use thread-local variables.
> + *
> + *    - Below, the workings of each probe is explained.  Since the
> + *    results depend on the command that is traced, we base the
> + *    explanations on the output from the example above.
> + */
> +
> +/*
> + *  Suppress the default output from the dtrace command.
> + */
> +#pragma D option quiet
> +
> +/*
> + *  This demonstrates how to declare a variable outside of a clause.
> + *  Variable B is declared to be of type int64_t.  This is a global
> + *  variable.  It is initialized to zero by default.
> + */
> +int64_t B;
> +
> +/*
> + *  In this clause for the BEGIN probe, A is initialized to 10 and B
> + *  is given a value of 20.  Both are global variables.  Variable C is
> + *  a clause-local variable initialized to 30.
> + */
> +BEGIN
> +{
> +  A = 10;
> +  B = 20;
> +  this->C = 30;
> +}
> +
> +/*
> + *  In this clause for the BEGIN probe, clause-local variable C is updated
> + *  to a value of 60.  This demonstrates that clause-local variables are
> + *  local to the set of clauses executed for a particular probe firing.
> + *  This set is executed in the order they appear in the program.
> + *
> + *  Here, there are 2 clauses for the BEGIN probe.  The printed values
> + *  in the example above confirm this update for C, as well as the initial
> + *  values for A and B.
> + */
> +BEGIN
> +{
> +  this->C += 30;
> +  printf("In %s\n",probename);
> +  printf("\tA = %d B = %d\n",A,B);
> +  printf("\tthis->C = %d\n",this->C);
> +}
> +
> +/*
> + *  The clause in this probe updates A.  Since this is a global variable,
> + *  this change is visible to all probes that fire after this probe has fired.
> + */
> +pid$target::fopen:entry
> +{
> +  A += 1;
> +}
> +
> +/*
> + *  The clause in this probe updates B and prints both A and B.  The
> + *  probefunc (fopen) and probename (return) are printed first, followed by
> + *  the values for A and B.
> + *  We are in the return probe here, so we know that the entry probe has
> + *  fired too.  This means that A has been updated as well.
> + *  This is exactly what we see in the output above: A is 11 and B is 21,
> + *  as it has just been updated.
> + *  Note that C is not visible here, because it is local to the BEGIN probe.
> + */
> +pid$target::fopen:return
> +{
> +  B += 1;
> +  printf("In %s - %s\n",probefunc,probename);
> +  printf("\tA = %d B = %d\n",A,B);
> +}
> +
> +/*
> + *  Both A and B are global variables and are visible in the END probe.  The
> + *  final values are printed here.
> + */
> +END
> +{
> +  printf("In %s\n",probename);
> +  printf("\tA = %d B = %d\n",A,B);
> +}



More information about the DTrace-devel mailing list