[DTrace-devel] kallmodsyms removal: reducing our kernel footprint some more
Nick Alcock
nick.alcock at oracle.com
Tue Apr 6 09:09:47 PDT 2021
So I'd like to drop /proc/kallmodsyms, since upstream responded to my
attempt to upstream it by saying no and removing the infrastructure I
was using to buil dit.
In the old days, /proc/kallmodsyms was minor stuff compared to the
amount of code we had in the kernel for other reasons, but nowadays it's
about half of what remains. It would be nice if we could get rid of it.
So... let's go back to basics. what does /proc/kallmodsyms give us? What
does DTrace need to know?
- Given an address, is it in an external module or not, and if so which
one is it in? This is mostly so we can look up its CTF types in the
right child dict, and for backtraces. This is easy: /proc/kallsyms
provides everything needed. (These days, even the code in DTrace to
deal with external modules is distinct from the code used to deal
with in-tree stuff, because we can rely on e.g. the fact that a given
external module has all its symbols in a single contiguous lump.)
- What size is a symbol? This matters only to determine if an address
is in a symbol or between symbols or not, and to identify which
symbol in a set of overlapping symbols is the "innermost". We can
lose this for external modules without too much trouble, since nested
symbols only really appear in the core kernel and most symbols really
*do* more or less tile the address space (and the instruction pointer
shouldn't really turn up outside the space covered by a symbol
anyway).
- Given a symbol that /proc/kallsyms does not report is in a module,
might it be in a module if the kernel was built suitably? This lets
us stabilize D scripts using the module-qualified ` operator against
changes in the kernel .config (and, to a lesser extent, do the same
with CTF content). These "modules" might be scattered all over the
kernel's text segment(s), intermingled or even nested, so
symbol-by-symbol treatment is necessary, and we do want sizing info.
This is not provided by the upstream kernel and was the original
justification for kallmodsyms, so it's this we need to replace.
The implementation of this third component has three pieces, most of
which we can completely replace with out-of-kernel stuff quite easily
(well, it involves some annoying rewriting but none of it is rocket
science).
- a build-time component in the toplevel makefile and in
scripts/Makefile.modbuiltin and scripts/kconfig/confdata.c that
produces a modules_thick.builtin file at every level of the build
tree (cascading to a final version at the root) that tracks the
mapping between module names and the object files that make them up.
This is a variant of modules.builtin that the kernel has long
produced, but the old mechanism to produce this was torn out by
Yamada, I'm sure *entirely* coincidentally, right after we tried to
upstream kallmodsyms last time (ostensibly to save a few milliseconds
of build time), so much of this is carrying that old kernel code
forward. There is, as far as I can see, no other way to determine
which object files might make up a module when the module is built
in: only the makefiles have access to this information. This is small
and easy to maintain, but will probably never be upstreamable.
(In CTFv4, we'll be able to extract object file -> function
information from the CTF, but even that won't help us tell which
module a given object file might be part of, if it were built as
one.)
I'm ignoring this for now, but in future we might possibly be able to
replace this with a separate makefile, in a separate source tree,
that invokes the upstream main kernel makefile and re-traverses the
tree (but in a more complex and fragile fashion than what we have
now).
CTF generation needs this stuff too to figure out what child dict to
put each type in, so we can at least move it into the CTF commit and
deal with it as part of the inevitable "no, go away forever" we'll
get as a response to trying to upstream *that*.
- changes to scripts/kallsyms.c and scripts/link-vmlinux.sh which use a
linker map file and the modules_thick.builtin stuff to add new
sections to vmlinux which contain the symbol -> module mapping and
size info.
We can drop all of this in favour of an out-of-kernel-tree analysis
program that scans the object files in the build tree (all of which
are named in modules_thick.builtin) and constructs a separate file
that maps symbol name -> module for all in-kernel objects (probably
via a string table and a sorted-by-symname array of (name, module)
pairs). We can localize this code entirely in the dtrace source tree
by adding this tool as a new (really obscure, long-name-only)
dtrace(8) option: since we're no longer planning to make the thing
useful outside dtrace, we can name the output dtrace_syms or
something. Of course, having the kernel build tree around at dtrace
build time is... likely to be annoying, so rather than this being
done when dtrace itself is built,: the kernel makefile grows a tiny
new target (not run by default) that just runs this tool and writes
the output to the usual /lib/modules/$(uname -r)/kernel directory.
Then DTrace can pick that file up and use it: distributors can have
the kernel build process depend on DTrace and write out a dtrace_syms
easily enough (they already need to patch it to add CTF support in
any case, and this is a much smaller change since all the actual work
is now being done by dtrace(8)).
Doing things this way will be slower than using a linker map file,
because we have to reread a pile of object files, but it shouldn't be
too bad. (This is close to how kallmodsyms used to work, only
simplified. A shame: I liked the linker map file approach, but that
only works if you can *generate* the linker map file, and that means
more attempts to cooperate with upstream and no thank you.)
- code in kernel/kallsyms.c which looks at those new sections and emits
/proc/kallmodsyms on demand. With the name->module mapping now
external, we can do all of this in DTrace itself: if a symbol isn't
module-qualified in /proc/kallsyms and isn't in this new file, we
know it can't be a builtin module and must be truly built-in, in
vmlinux`.
So this is actually *better structured* than the previous approach,
in that all the code to handle what used to be /proc/kallmodsyms is
now part of DTrace: all the kernel has is modules_thick.builtin and
one new tiny target that asks DTrace to generate what used to be the
kallmodsyms data itself.
The result will be a bit slower because we have to read one more file at
DTrace startup time, but it's only one file and it can be read in one
gulp so it shouldn't be too hard.
I'll work on this unless people can see a better approach.
More information about the DTrace-devel
mailing list