[DTrace-devel] [PATCH v3] Add support for cleanpath() subroutine

Kris Van Hees kris.van.hees at oracle.com
Thu Jan 25 18:52:37 UTC 2024


On Thu, Jan 25, 2024 at 01:23:14PM -0500, Eugene Loh via DTrace-devel wrote:
> On 1/25/24 02:38, Kris Van Hees wrote:
> 
> > Actually, there is one change that I think does make sense...  I would get rid
> > of trailing / *unless* that is the only character (i.e. then it is really a
> > leading /).  If the purpose of cleanpath() is to create a more minimal form of
> > the given path, then stripping off the trailing / seems to make sense, even if
> > the documentation doesn't address that case.
> > 
> > And in view of my commentary quoted below, I think it makes sense to implement
> > cleanpath() to be consistent.  Trailing / is certainly redundant because it is
> > usually only used as a way to represent that a path is a directory.  But the
> > more canonical path would not have that trailing /.
> > 
> > Makes sense?
> 
> It certainly makes sense to make cleanpath() "consistent," but we're left
> debating rationales for what that means.

Sure, but unless we implement exactly what the legacy version did, and in
the absence of detailed documentation, we're left with determining consistent
semantics ourselves and going with that while still adhering to what the
documented intended behaviour is.

> FWIW, Alan seemed to think that the output should have the trailing '/'
> depending on whether the input did.  That's why I changed to that behavior.
> https://oss.oracle.com/pipermail/dtrace-devel/2023-November/004061.html He
> also argues that "" and "." are different.  He makes a case for "" meaning
> "no file."

I disagree with that notion because this routine specifically exists to remove
redundant information from the path to make it a shorter representation.  In
a functiomal sense, a trailing / has no purpose other than visibly presenting
that string is the path to a directory.  I understand Alan's point but I think
that goes against what cleanpath() is meant to do and how it seems to be used.
Also, dirname() does not output with a trailing /, and if you call it with a
path like /a/b/ the output is /a/b and not /a/b/.

Given that Alan does not feel strongly about this, I would propose not to keep
the trailing /.  If that is significant, I would expect a user could very
easily add it themselves.  I frankly cannot really imagine a case where this is
an issue - if there is some use case where you want to differentiate based on
whether something is a path to a directory or a file, you either know that the
paths will have trailing / for directories and not for files (and thus you know
already if it is a directory), or you do not, and if directories are presented
is mixed form (some with trailing /, some not) then this is not useful anyway.

> > On Thu, Jan 25, 2024 at 12:46:26AM -0500, Kris Van Hees via DTrace-devel wrote:
> > > On Wed, Jan 24, 2024 at 09:06:44PM -0500, Eugene Loh via DTrace-devel wrote:
> > > > On 1/24/24 16:04, Kris Van Hees wrote:
> > > > 
> > > > > On Thu, Dec 21, 2023 at 01:11:25PM -0500, eugene.loh at oracle.com wrote:
> > > > > > Replace all "//" with "/".
> > > > > > Replace all "/./" with "/".
> > > > > > Replace all "/foo/.." with "/".
> > > This is not really an accurate description of what the patch does because
> > > these three transformations would never result in a "." path.
> > > 
> > > > > > Notice that the results differ in some cases from what was
> > > > > > done in legacy DTrace:
> > > > > Some comments on this (after reviewing the documentation again):
> > > > > 
> > > > > >       input string         current patch     legacy DTrace
> > > > > > 
> > > > > >       "/."                 "/"               "/."
> > > > > >       "a/."                "a"               "a/."
> > > > > >       "./a"                "a"               "./a"
> > > > > These three seem correct to me in how the current patch handles it.
> > > > > 
> > > > > >       "../.."              "../.."           "."
> > > > > >       "a/.."               "."               "a"
> > > > > >       "a/../.."            ".."              "a"
> > > > > A case could be made for all of them being "",
> > > > I do not see how "../.." and "" are equivalent.
> > > It depends on the context.  Given that a relative path presented to cleanpath
> > > does not carry any information about its root, and in fact could very well be
> > > relative to / itself, remaining .. elements are meaningless.  In the absence
> > > of parent elements, .. (backing up to the parent) has no meaning.  So, as I
> > > said, a case could be made based on that to say that cleanpath("..") could
> > > validly yield "" as result.
> > > 
> > > > > given that cleanpath() is
> > > > > supposed to remove redundant elements (. and ..)
> > > > ".." by itself is not redundant, but the combination "/foo/.." can be
> > > > collapsed.
> > > Again, by the reasoning I present above, .. by itself can be considered
> > > redundant because it represents backing up a level, and within the context of
> > > a cleanpath() invocation, the meaning of ".." becomes pointless.
> > > 
> > > But as I mention below, I agree that a valid case can be made for what you
> > > propose in this patch, albeit with the one exception I noted.
> > > 
> > > > > and no context is known
> > > > > for relative paths.  But I do agree that there is also a case to be made
> > > > > to not get rid of .. elements that would apply to the root of the relative
> > > > > path, becausae discarding them effectively causes information loss.
> > > > > 
> > > > > Except for the case of "a/..".  That one should be "" because the .. should
> > > > > be considered redundant in view of having a present path element, and thus
> > > > > it is possible to back up one path element, which would yield "".  I do not
> > > > > see how the result of "." would be considered valid because that in itself
> > > > > is a redundant element (and thus should still be removed).
> > > > I guess I just don't know what standard or use case to consider. Anyhow, the
> > > > documentation I saw speaks of removing "/./" without talking specifically
> > > > about "." in isolation.  There are a lot of cases when someone would not
> > > > consider "." and "" to be the same... e.g., "cd .", "pushd .", etc.  So
> > > > "a/.." is the same as ".".  Are they also the same as ""?  It depends on the
> > > > situation, and it's unclear to me where cleanpath() is used.  The
> > > > documentation I saw (Oracle Linux DTrace Guide) didn't say anything was
> > > > wrong with a bare ".".
> > > The documentation talks about /./ and /../ but not in a literal sense.  If
> > > it was in a literal sense, then you would need to leave "a/.." alone because
> > > it does not contain /../ or /./.  Clearly, the meaning is in terms of path
> > > elements and the meaning of . and .. as path elements (which is almost certainly
> > > why they are shown as /./ and /../, i.e. path elements that are literal .
> > > and ..).
> > > 
> > > While it is not clear where cleanpath() is meant to be used (I do not find any
> > > real examples of its use cases), it is unlikely it is meant to be used to
> > > sanitize paths that are to be used in commands, or rather, it is certainly not
> > > guaranteed to be sensible to do so.  After all, the dpcumentation points out
> > > that "it is possible that cleanpath could take a valid path and return a
> > > shorter, invalid path."
> > > 
> > > More importantly though, the documentation is clear that cleanpath() should be
> > > removing redundant path elements, specifically, /./ and /../ (and again, the /
> > > are here to denote that these are path elements and not portions of path
> > > elements).  And /./ is certainly redundant in paths.
> > > 
> > > Also, the documentation does not talk about adding anything, and the case of
> > > "a/.." listed above shows "." as output for the current patch, yet the input
> > > string does not contain a literal . element, so rather than removing redundant
> > > elements, this ends up introducing an element, and one that is actually one of
> > > the aforementioned redundant elements that are meant to be removed.
> > > 
> > > Now, I fully admit that I am baffled by the mere existance of this cleanpath()
> > > subroutine, and I really wonder what the use case for it was and why it was
> > > implemented the way it was.  The legacy output clearly shows that it is not
> > > even correct in somme cases, and its documented behaviour is vague.  From what
> > > I can gather from some searches, it seems that cleanpath() was used in some
> > > translators on other OSes, most likely to present paths in a nicer form.
> > > No such use exists on Linux right now.
> > > 
> > > And (bear with me), then there is the issue that cleanpath() as implemented in
> > > the legacy implementation does not actually follow the documented behaviour
> > > either!
> > > 
> > > Personally, I would prefer to just get rid of cleanpath() given all of the
> > > above, because there clearly isn't a good sense of what it is meant to do in
> > > certainly situations and the documentation is no help.
> > > 
> > > So...
> > > 
> > > Thinking this through some more, I first apologize for the lengthy discourse
> > > above but I think it is important to establish that this cleanpath() routine
> > > and its documentation is a mess.  An ugly mess.
> > > 
> > > But I do acknowledge that you have done a fine job trying to make sense of the
> > > mess, and it provides a consistent behaviour for the subroutine albeit not in
> > > line with the documentation.  So, under the presumption that you can document
> > > the behaviour in the commit message (which I usually would object to, but I
> > > think we need to so that the documentation can be updated based on that), I
> > > think going with this implementation is the best course of action.
> > > 
> > > Maybe in terms of a sequence of search-and-replace regular expressions, in
> > > order?
> > > 
> > > > > >       "a/../b"             "b"               "/b"
> > > > > I agree.
> > > > > 
> > > > > > In the first set, legacy DTrace has unnecessary trailing "." or "/."
> > > > > > or unnecessary leading "./".  They would seem to be wrong.
> > > > > > 
> > > > > > In the next cases, the legacy results are clearly wrong, not
> > > > > > handling ".." correctly.
> > > > > > 
> > > > > > In the final case, a relative path is converted into an absolute
> > > > > > path, which is also clearly incorrect.
> > > > _______________________________________________
> > > > DTrace-devel mailing list
> > > > DTrace-devel at oss.oracle.com
> > > > https://oss.oracle.com/mailman/listinfo/dtrace-devel
> > > _______________________________________________
> > > DTrace-devel mailing list
> > > DTrace-devel at oss.oracle.com
> > > https://oss.oracle.com/mailman/listinfo/dtrace-devel
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel



More information about the DTrace-devel mailing list