hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
Date Wed, 14 Feb 2018 11:54:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363850#comment-16363850
] 

Steve Loughran commented on HDFS-13108:
---------------------------------------

Where is the documentation of the URI ?

h3. OzoneFileSystem

L95 can the pattern be made static? If it is only used in initialize, it can be a local var
L308. Use Precondition.checkArgument; include the URL in the error message built up
L433. What if the path doesn't have a parent?

h3. TestOzoneFileInterfaces

L43. the imports are out of order w.r.t the Hadoop rules. Can it be fixed now, before any
merge.
L44. If you do a static import of Assert. no need to use \{{Assert.}} in front of every assertion.

L98. Have init declare it throws Exception and no need to catch & rethrow URI syntax lossily.
L127. Only need to use \{{this.}} prefix in the ctor
L141 use IOUtils to close all of these, if for some reason you can't check for each one being
null first.

L150. Do a cast, not an assert, so that something meaninful is thrown. I have a strict "Veto
all all patches where AssertTrue/AssertFalse don't include an error message" policy, and as
I've been invited to comment, you've just encountered it. Sorry. As ClassCastException is
meaningful, avoid the problem by not bothering with the assert.

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> -------------------------------------------------------------------
>
>                 Key: HDFS-13108
>                 URL: https://issues.apache.org/jira/browse/HDFS-13108
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>         Attachments: HDFS-13108-HDFS-7240.001.patch, HDFS-13108-HDFS-7240.002.patch,
HDFS-13108-HDFS-7240.003.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg.  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the keys from
the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs -ls o3://datanode:9864/test/bucket1/dir1'
is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest problem is that
there is a Path.makeQualified call which could transform unqualified url to qualified url.
This is part of the Path.java so it's common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema (eg. o3://
) and authority (eg: datanode: 9864) from the defaultfs and use the relative path as the end
of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will return
o3://datanode:9864/dir1/file which is obviously wrong (the good would be o3://datanode:9864/TEST/BUCKET1/dir1/file).
I tried to do a workaround with using a custom makeQualified in the Ozone code and it worked
from command line but couldn't work with Spark which use the Hadoop api and the original makeQualified
path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service discovery)
but it would be configurable with additional hadoop configuraion values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864
(this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it should not include
dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive
is set, falling back to uploading libraries under SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__2440448967844904444.zip
-> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__2440448967844904444.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the name of the
home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message