hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop
Date Fri, 05 Feb 2016 00:27:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133405#comment-15133405

Chris Nauroth commented on HADOOP-12666:

bq. Why is there a separate checkstyle.xml for this module?

Judging from the content, I am guessing it was copied from hadoop-tools/hadoop-azure/src/config/checkstyle.xml.
 However, the checkstyle.xml in hadoop-azure is really a historical artifact that we intend
to clean up, not a pattern meant to propagate into new code.

HADOOP-11899 tracks the cleanup for hadoop-azure.  New code contributions like this one can
stick to the standard Hadoop style from the beginning.

> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch,
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
> h2. Description
> This JIRA describes a new file system implementation for accessing Windows Azure Data
Lake Store (ADL) from within Hadoop. This would enable existing Hadoop applications such has
MR, HIVE, Hbase etc..,  to use ADL store as input or output.
> ADL is ultra-high capacity, Optimized for massive throughput with rich management and
security features. More details available at https://azure.microsoft.com/en-us/services/data-lake-store/
> h2. High level design
> ADL file system exposes RESTful interfaces compatible with WebHdfs specification 2.7.1.
> At a high level, the code here extends the SWebHdfsFileSystem class to provide an implementation
for accessing ADL storage; the scheme ADL is used for accessing it over HTTPS. We use the
URI scheme:
> {code}adl://<URI to account>/path/to/file{code} 
> to address individual Files/Folders. Tests are implemented mostly using a Contract implementation
for the ADL functionality, with an option to test against a real ADL storage if configured.
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work can be seen
in. Credit for this work goes to the team: [~vishwajeet.dusane], [~snayak], [~srevanka], [~kiranch],
[~chakrab], [~omkarksa], [~snvijaya], [~ansaiprasanna]  [~jsangwan]
> h2. Test
> Besides Contract tests, we have used ADL as the additional file system in the current
public preview release. Various different customer and test workloads have been run against
clusters with such configurations for quite some time. The current version reflects to the
version of the code tested and used in our production environment.

This message was sent by Atlassian JIRA

View raw message