hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13037) Azure Data Lake Client: Support Azure data lake as a file system in Hadoop
Date Wed, 02 Nov 2016 06:06:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627882#comment-15627882
] 

Chris Douglas commented on HADOOP-13037:
----------------------------------------

Thanks for working through the build issues, and following up on the review feedback [~vishwajeet.dusane].

This may not require a branch. While the patch is large, much of it is removing code (to sever
the WebHDFS dependency) and adding new tests. The former can be committed directly to trunk,
and the latter we can commit in one or more followup issues (as subtasks of HADOOP-13257?).
Any objections to this approach?

A few minor comments from skimming the latest patch:
* {{AccessTokenProviderCache}} uses an unusual singleton pattern. Does it need to create a
private instance of itself, rather than initializing a static map directly?
* {{AdlPermission#equals}} and {{#hashcode}} implementations are identical to what they override.
Instead of reading from the config to report {{getAclBit}} (which is expensive, and unstable)
it could either store the boolean or return basic {{FsPermission}} instances when disabled
(and conversely, {{AdlPermission}} can always return true when the config knob is set). That
this is a configurable boolean is surprising, and could be explained in this class's (empty)
javadoc.
* {{TokenProviderType::fromString}} is unused
* In initialize, {{accountFQDN}} init can move up, so the {{accessTokenProvider}} init can
be contiguous
* Both the subclasses of {{AzureADTokenProvider}} throw exceptions for all abstract methods
save init, and appear not to use the private instance they configure. The init pattern here
is pretty complicated, with {{initialize}} creating an instance and adding it to a global
cache, and not using the reference it retains. Can this be simplified? If the instance were
returned, {{AccessTokenProviderCache}} additions can be moved to {{AdlFileSystem}}, so all
the get/add calls are in one class.
* The {{TokenProviderCache}} type seems unnecessary, if {{AzureADTP}} instances can return
null to prevent caching (instead of the instanceof/cast)
* It's a matter of taste, but importing all the static members of {{AdlConfKeys}} in {{AdlFileSystem}}
(or none of them) is easier to read.
* The {{<P>}} tags are unnecessary in javadocs. Some are also oddly formatted.
* Some of the test javadocs still refer to {{BufferManager.java}} and related classes that
have been removed

> Azure Data Lake Client: Support Azure data lake as a file system in Hadoop
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-13037
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13037
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/azure, tools
>            Reporter: Shrikant Naidu
>            Assignee: Vishwajeet Dusane
>             Fix For: 2.9.0
>
>         Attachments: HADOOP-13037 Proposal.pdf, HADOOP-13037-001.patch, HADOOP-13037-002.patch,
HADOOP-13037-003.patch
>
>
> The jira proposes an improvement over HADOOP-12666 to remove webhdfs dependencies from
the ADL file system client and build out a standalone client. At a high level, this approach
would extend the Hadoop file system class to provide an implementation for accessing Azure
Data Lake. The scheme used for accessing the file system will continue to be 
> adl://<accountname>.azuredatalake.net/path/to/file. 
> The Azure Data Lake Cloud Store will continue to provide a webHDFS rest interface. The
client will  access the ADLS store using WebHDFS Rest APIs provided by the ADLS store. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message