[DTrace-devel] kallmodsyms removal: reducing our kernel footprint some more

Tue Apr 6 09:09:47 PDT 2021

So I'd like to drop /proc/kallmodsyms, since upstream responded to my
attempt to upstream it by saying no and removing the infrastructure I
was using to buil dit.

In the old days, /proc/kallmodsyms was minor stuff compared to the
amount of code we had in the kernel for other reasons, but nowadays it's
about half of what remains. It would be nice if we could get rid of it.

So... let's go back to basics. what does /proc/kallmodsyms give us? What
does DTrace need to know?

 - Given an address, is it in an external module or not, and if so which
   one is it in? This is mostly so we can look up its CTF types in the
   right child dict, and for backtraces. This is easy: /proc/kallsyms
   provides everything needed.  (These days, even the code in DTrace to
   deal with external modules is distinct from the code used to deal
   with in-tree stuff, because we can rely on e.g. the fact that a given
   external module has all its symbols in a single contiguous lump.)

 - What size is a symbol? This matters only to determine if an address
   is in a symbol or between symbols or not, and to identify which
   symbol in a set of overlapping symbols is the "innermost". We can
   lose this for external modules without too much trouble, since nested
   symbols only really appear in the core kernel and most symbols really
   *do* more or less tile the address space (and the instruction pointer
   shouldn't really turn up outside the space covered by a symbol
   anyway).

 - Given a symbol that /proc/kallsyms does not report is in a module,
   might it be in a module if the kernel was built suitably? This lets
   us stabilize D scripts using the module-qualified ` operator against
   changes in the kernel .config (and, to a lesser extent, do the same
   with CTF content). These "modules" might be scattered all over the
   kernel's text segment(s), intermingled or even nested, so
   symbol-by-symbol treatment is necessary, and we do want sizing info.
   This is not provided by the upstream kernel and was the original
   justification for kallmodsyms, so it's this we need to replace.

The implementation of this third component has three pieces, most of
which we can completely replace with out-of-kernel stuff quite easily
(well, it involves some annoying rewriting but none of it is rocket
science).

 - a build-time component in the toplevel makefile and in
   scripts/Makefile.modbuiltin and scripts/kconfig/confdata.c that
   produces a modules_thick.builtin file at every level of the build
   tree (cascading to a final version at the root) that tracks the
   mapping between module names and the object files that make them up.

   This is a variant of modules.builtin that the kernel has long
   produced, but the old mechanism to produce this was torn out by
   Yamada, I'm sure *entirely* coincidentally, right after we tried to
   upstream kallmodsyms last time (ostensibly to save a few milliseconds
   of build time), so much of this is carrying that old kernel code
   forward. There is, as far as I can see, no other way to determine
   which object files might make up a module when the module is built
   in: only the makefiles have access to this information. This is small
   and easy to maintain, but will probably never be upstreamable.

   (In CTFv4, we'll be able to extract object file -> function
   information from the CTF, but even that won't help us tell which
   module a given object file might be part of, if it were built as
   one.)

   I'm ignoring this for now, but in future we might possibly be able to
   replace this with a separate makefile, in a separate source tree,
   that invokes the upstream main kernel makefile and re-traverses the
   tree (but in a more complex and fragile fashion than what we have
   now).

   CTF generation needs this stuff too to figure out what child dict to
   put each type in, so we can at least move it into the CTF commit and
   deal with it as part of the inevitable "no, go away forever" we'll
   get as a response to trying to upstream *that*.

 - changes to scripts/kallsyms.c and scripts/link-vmlinux.sh which use a
   linker map file and the modules_thick.builtin stuff to add new
   sections to vmlinux which contain the symbol -> module mapping and
   size info.

   We can drop all of this in favour of an out-of-kernel-tree analysis
   program that scans the object files in the build tree (all of which
   are named in modules_thick.builtin) and constructs a separate file
   that maps symbol name -> module for all in-kernel objects (probably
   via a string table and a sorted-by-symname array of (name, module)
   pairs). We can localize this code entirely in the dtrace source tree
   by adding this tool as a new (really obscure, long-name-only)
   dtrace(8) option: since we're no longer planning to make the thing
   useful outside dtrace, we can name the output dtrace_syms or
   something. Of course, having the kernel build tree around at dtrace
   build time is... likely to be annoying, so rather than this being
   done when dtrace itself is built,: the kernel makefile grows a tiny
   new target (not run by default) that just runs this tool and writes
   the output to the usual /lib/modules/$(uname -r)/kernel directory.
   Then DTrace can pick that file up and use it: distributors can have
   the kernel build process depend on DTrace and write out a dtrace_syms
   easily enough (they already need to patch it to add CTF support in
   any case, and this is a much smaller change since all the actual work
   is now being done by dtrace(8)).

   Doing things this way will be slower than using a linker map file,
   because we have to reread a pile of object files, but it shouldn't be
   too bad. (This is close to how kallmodsyms used to work, only
   simplified. A shame: I liked the linker map file approach, but that
   only works if you can *generate* the linker map file, and that means
   more attempts to cooperate with upstream and no thank you.)

 - code in kernel/kallsyms.c which looks at those new sections and emits
   /proc/kallmodsyms on demand. With the name->module mapping now
   external, we can do all of this in DTrace itself: if a symbol isn't
   module-qualified in /proc/kallsyms and isn't in this new file, we
   know it can't be a builtin module and must be truly built-in, in
   vmlinux`.

   So this is actually *better structured* than the previous approach,
   in that all the code to handle what used to be /proc/kallmodsyms is
   now part of DTrace: all the kernel has is modules_thick.builtin and
   one new tiny target that asks DTrace to generate what used to be the
   kallmodsyms data itself.

The result will be a bit slower because we have to read one more file at
DTrace startup time, but it's only one file and it can be read in one
gulp so it shouldn't be too hard.

I'll work on this unless people can see a better approach.