hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
Date Fri, 16 Feb 2018 13:52:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367319#comment-16367319

Steve Loughran commented on HDFS-13108:

bq L43: I tried to find a written description of the right order of imports (in hadoop wiki
and with google). Couldn't find any reference (please send me RTFM link if there is one).
I improved the import order according to existing classes. The only rule what I found is to
grouping the java/hadoop/other classes together (The order of the groups are different even
between Namenode.java and Datanode.java) Please let me know what is the main rule and I would
be happy to improve it further.

I tried to find it too...there's a convention which is underfollowed about

non org.apache.*
all static stuff in order

Once things are in, its generally safest to leave alone because import changes are where patch
merges conflict so much. 

I've added some comments on github; mostly suggestions about improving the text error messages.
Otherwise, LGTM

What do others think?

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> -------------------------------------------------------------------
>                 Key: HDFS-13108
>                 URL: https://issues.apache.org/jira/browse/HDFS-13108
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>         Attachments: HDFS-13108-HDFS-7240.001.patch, HDFS-13108-HDFS-7240.002.patch,
HDFS-13108-HDFS-7240.003.patch, HDFS-13108-HDFS-7240.005.patch
> A. Current state
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg.  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the keys from
the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs -ls o3://datanode:9864/test/bucket1/dir1'
is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest problem is that
there is a Path.makeQualified call which could transform unqualified url to qualified url.
This is part of the Path.java so it's common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema (eg. o3://
) and authority (eg: datanode: 9864) from the defaultfs and use the relative path as the end
of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will return
o3://datanode:9864/dir1/file which is obviously wrong (the good would be o3://datanode:9864/TEST/BUCKET1/dir1/file).
I tried to do a workaround with using a custom makeQualified in the Ozone code and it worked
from command line but couldn't work with Spark which use the Hadoop api and the original makeQualified
> D.) Solution
> We should support makeQualified calls, so we can use any path in the defaultFS.
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service discovery)
but it would be configurable with additional hadoop configuraion values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864
(this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it should not include
dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive
is set, falling back to uploading libraries under SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__2440448967844904444.zip
-> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__2440448967844904444.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the name of the
home directory.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message