HADOOP-19815. Document Path URI vs String constructor and trailing sl…#8307
HADOOP-19815. Document Path URI vs String constructor and trailing sl…#8307deepujain wants to merge 6 commits intoapache:trunkfrom
Conversation
…ash. Clarify that Path(URI) preserves the URI including trailing slash while Path(String) normalizes and strips trailing slashes; document impact on URI.resolve() and when to use Path(URI) or path concatenation.
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
My JIRA id is deepujain |
ctubbsii
left a comment
There was a problem hiding this comment.
+1 to the changes to the docs, and the addition of the test. However, it does not fully resolve the issue described in the JIRA issue. To fully resolve the issue, the FileSystem methods need to be fixed, because their trailing slashes do matter.
…rectory, getHomeDirectory.
|
@ctubbsii thanks for the review. I have made additional changes. Please take another look when you have a moment. |
ctubbsii
left a comment
There was a problem hiding this comment.
I think this is an improvement, but I had a question about how API changes should be marked in my below comments.
| * @param uri a filesystem or path URI | ||
| * @return a URI with path ending in "/" (or "/" when path is empty) | ||
| */ | ||
| public static URI ensureDirectoryUri(URI uri) { |
There was a problem hiding this comment.
Does this need any API annotations? It seems to be a new public method in an otherwise stable API. Is there a better non-public place for this utility method? I'm asking because I'm genuinely not sure what Hadoop norms are for API changes.
There was a problem hiding this comment.
Good point.
In hadoop-common, new public methods on an otherwise stable class are usually marked with @InterfaceStability.Evolving so the method can evolve without changing the class’s stability. Path is already @InterfaceAudience.Public and @InterfaceStability.Stable, so I added @InterfaceStability.Evolving on both ensureDirectoryUri(URI) and asDirectory().
There was a problem hiding this comment.
Keeping them on public Path gives callers a single, clear API for “directory URI for this path” (e.g. getWorkingDirectory().asDirectory().toUri().resolve("x")) instead of string hacks. If the project prefers these as package-private or on an internal helper, I can move them.
| * | ||
| * @return a Path with the same location and a directory-style URI | ||
| */ | ||
| public Path asDirectory() { |
There was a problem hiding this comment.
Same question about API changes.
Summary
Documents how Path handles trailing slashes and how that affects
URI.resolve(), and fixes the FileSystem side of the issue. The String constructor (and other constructors that use string path components) normalize the path and remove trailing slashes; the URI constructor preserves the provided URI afterURI.normalize(), including any trailing slash. In addition, FileSystem now returns directory-form URIs/Paths:getUri(),getWorkingDirectory(), andgetHomeDirectory()have a trailing slash where appropriate, sofs.getWorkingDirectory().toUri().resolve("mytempdir")correctly yields.../user/me/mytempdir(per HADOOP-19815 and reviewer feedback).Change
toUri().resolve()).Path(String): noted that trailing slashes are removed; usePath(URI)when resolution must preserve a trailing slash.Path(URI): noted that the URI is preserved including trailing slash.Path.ensureDirectoryUri(URI)andPath.asDirectory()for directory-form URIs/Paths.getUri(),getHomeDirectory(),getWorkingDirectory(): they return directory form (trailing slash). DefaultgetHomeDirectory()returns...asDirectory().getUri()usesPath.ensureDirectoryUri(...);getWorkingDirectory()/getHomeDirectory()return directory form (e.g..asDirectory()).testTrailingSlashAndUriResolve(Path String vs URI),testEnsureDirectoryUri,testAsDirectoryAndResolve.testFileSystemDirectoryUriForResolve.JIRA
Fixes HADOOP-19815