[DTrace-devel] [PATCH v3] Add support for cleanpath() subroutine

Kris Van Hees kris.van.hees at oracle.com
Thu Jan 25 05:46:26 UTC 2024


On Wed, Jan 24, 2024 at 09:06:44PM -0500, Eugene Loh via DTrace-devel wrote:
> On 1/24/24 16:04, Kris Van Hees wrote:
> 
> > On Thu, Dec 21, 2023 at 01:11:25PM -0500, eugene.loh at oracle.com wrote:
> > > Replace all "//" with "/".
> > > Replace all "/./" with "/".
> > > Replace all "/foo/.." with "/".

This is not really an accurate description of what the patch does because
these three transformations would never result in a "." path.

> > > 
> > > Notice that the results differ in some cases from what was
> > > done in legacy DTrace:
> > Some comments on this (after reviewing the documentation again):
> > 
> > >      input string         current patch     legacy DTrace
> > > 
> > >      "/."                 "/"               "/."
> > >      "a/."                "a"               "a/."
> > >      "./a"                "a"               "./a"
> > These three seem correct to me in how the current patch handles it.
> > 
> > >      "../.."              "../.."           "."
> > >      "a/.."               "."               "a"
> > >      "a/../.."            ".."              "a"
> > A case could be made for all of them being "",
> 
> I do not see how "../.." and "" are equivalent.

It depends on the context.  Given that a relative path presented to cleanpath
does not carry any information about its root, and in fact could very well be
relative to / itself, remaining .. elements are meaningless.  In the absence
of parent elements, .. (backing up to the parent) has no meaning.  So, as I
said, a case could be made based on that to say that cleanpath("..") could
validly yield "" as result.

> > given that cleanpath() is
> > supposed to remove redundant elements (. and ..)
> 
> ".." by itself is not redundant, but the combination "/foo/.." can be
> collapsed.

Again, by the reasoning I present above, .. by itself can be considered
redundant because it represents backing up a level, and within the context of
a cleanpath() invocation, the meaning of ".." becomes pointless.

But as I mention below, I agree that a valid case can be made for what you
propose in this patch, albeit with the one exception I noted.

> > and no context is known
> > for relative paths.  But I do agree that there is also a case to be made
> > to not get rid of .. elements that would apply to the root of the relative
> > path, becausae discarding them effectively causes information loss.
> > 
> > Except for the case of "a/..".  That one should be "" because the .. should
> > be considered redundant in view of having a present path element, and thus
> > it is possible to back up one path element, which would yield "".  I do not
> > see how the result of "." would be considered valid because that in itself
> > is a redundant element (and thus should still be removed).
> 
> I guess I just don't know what standard or use case to consider. Anyhow, the
> documentation I saw speaks of removing "/./" without talking specifically
> about "." in isolation.  There are a lot of cases when someone would not
> consider "." and "" to be the same... e.g., "cd .", "pushd .", etc.  So
> "a/.." is the same as ".".  Are they also the same as ""?  It depends on the
> situation, and it's unclear to me where cleanpath() is used.  The
> documentation I saw (Oracle Linux DTrace Guide) didn't say anything was
> wrong with a bare ".".

The documentation talks about /./ and /../ but not in a literal sense.  If
it was in a literal sense, then you would need to leave "a/.." alone because
it does not contain /../ or /./.  Clearly, the meaning is in terms of path
elements and the meaning of . and .. as path elements (which is almost certainly
why they are shown as /./ and /../, i.e. path elements that are literal .
and ..).

While it is not clear where cleanpath() is meant to be used (I do not find any
real examples of its use cases), it is unlikely it is meant to be used to
sanitize paths that are to be used in commands, or rather, it is certainly not
guaranteed to be sensible to do so.  After all, the dpcumentation points out
that "it is possible that cleanpath could take a valid path and return a
shorter, invalid path."

More importantly though, the documentation is clear that cleanpath() should be
removing redundant path elements, specifically, /./ and /../ (and again, the /
are here to denote that these are path elements and not portions of path
elements).  And /./ is certainly redundant in paths.

Also, the documentation does not talk about adding anything, and the case of
"a/.." listed above shows "." as output for the current patch, yet the input
string does not contain a literal . element, so rather than removing redundant
elements, this ends up introducing an element, and one that is actually one of
the aforementioned redundant elements that are meant to be removed.

Now, I fully admit that I am baffled by the mere existance of this cleanpath()
subroutine, and I really wonder what the use case for it was and why it was
implemented the way it was.  The legacy output clearly shows that it is not
even correct in somme cases, and its documented behaviour is vague.  From what
I can gather from some searches, it seems that cleanpath() was used in some
translators on other OSes, most likely to present paths in a nicer form.
No such use exists on Linux right now.

And (bear with me), then there is the issue that cleanpath() as implemented in
the legacy implementation does not actually follow the documented behaviour
either!

Personally, I would prefer to just get rid of cleanpath() given all of the
above, because there clearly isn't a good sense of what it is meant to do in
certainly situations and the documentation is no help.

So...

Thinking this through some more, I first apologize for the lengthy discourse
above but I think it is important to establish that this cleanpath() routine
and its documentation is a mess.  An ugly mess.

But I do acknowledge that you have done a fine job trying to make sense of the
mess, and it provides a consistent behaviour for the subroutine albeit not in
line with the documentation.  So, under the presumption that you can document
the behaviour in the commit message (which I usually would object to, but I
think we need to so that the documentation can be updated based on that), I
think going with this implementation is the best course of action.

Maybe in terms of a sequence of search-and-replace regular expressions, in
order?

> > >      "a/../b"             "b"               "/b"
> > I agree.
> > 
> > > In the first set, legacy DTrace has unnecessary trailing "." or "/."
> > > or unnecessary leading "./".  They would seem to be wrong.
> > > 
> > > In the next cases, the legacy results are clearly wrong, not
> > > handling ".." correctly.
> > > 
> > > In the final case, a relative path is converted into an absolute
> > > path, which is also clearly incorrect.
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list