hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vishwajeet dusane (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-12666) Support Windows Azure Data Lake - as a file system in Hadoop
Date Tue, 22 Dec 2015 10:47:46 GMT
vishwajeet dusane created HADOOP-12666:

             Summary: Support Windows Azure Data Lake - as a file system in Hadoop
                 Key: HADOOP-12666
                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
             Project: Hadoop Common
          Issue Type: New Feature
          Components: tools
            Reporter: vishwajeet dusane

h2. Description
This JIRA describes a new file system implementation for accessing Windows Azure Data Lake
Store (ADL) from within Hadoop. This would enable existing Hadoop applications such has MR,
HIVE, Hbase etc..,  to use ADL store as input or output.
ADL is ultra-high capacity, Optimized for massive throughput with rich management and security
features. More details available at https://azure.microsoft.com/en-us/services/data-lake-store/

h2. High level design
ADL file system exposes RESTful interfaces compatible with WebHdfs specification 2.7.1.
At a high level, the code here extends the SWebHdfsFileSystem class to provide an implementation
for accessing ADL storage; the scheme ADL is used for accessing it over HTTPS. We use the
URI scheme:
{code}adl://<URI to account>/path/to/file{code} 
to address individual Files/Folders. Tests are implemented mostly using a Contract implementation
for the ADL functionality, with an option to test against a real ADL storage if configured.

h2. Credits and history
This has been ongoing work for a while, and the early version of this work can be seen in.
Credit for this work goes to the team: [~vishwajeet.dusane], [~snayak], [~srevanka], [~kiranch],
[~chakrab], [~omkarksa], [~snvijaya], [~ansaiprasanna]  [~jsangwan]

h2. Test
Besides Contract tests, we have used ADL as the additional file system in the current public
preview release. Various different customer and test workloads have been run against clusters
with such configurations for quite some time. The current version reflects to the version
of the code tested and used in our production environment.

This message was sent by Atlassian JIRA

View raw message