hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vishwajeet Dusane (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop
Date Thu, 10 Mar 2016 16:43:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189518#comment-15189518

Vishwajeet Dusane commented on HADOOP-12666:

*Notes From Mar 9, 2016 Call w/ MSFT*
Who: Cloudera: Aaron Fabbri, Tony Wu, MSFT: Vishwajeet, Cathy, Chris Douglas, Shrikant
1. Packaging / Code Structure
 - In general, ADL extension of WebHDFS would not be acceptable as long term solution
 - Webhdfs client not designed for extension.
 - [Available options as of today|https://issues.apache.org/jira/browse/HADOOP-12666?focusedCommentId=15186380&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15186380]
 - Option 1 vs 2 (refactor WebHDFS) vs 3 (copy paste code, bad) 
 - Option 2 (MSFT): Need to make change to WebHDFS to accept ADL stuff. May be significant
 - Raise a separate JIRA for WebHDFS extension

2. WebHDFS and ADL cannot co-exist problem if both follows OAuth2 authentication protocol
 - Near term: specify limitation of only one webhdfs client at a time w/ OAUTH.  Ok to have
Webhdfs non-oauth and ADL configured on same cluster. - AP: Vishwajeet to document as known
 - Long term: v2 of adl connector that factors out webhdfs client commonality better

3. Integrity / Semantics
 - Single writer semantics?
 - See leaseId in PrivateAzureDataLakeFileSystem::createNonRecursive()
 - Append semantics does not close connection hence the leaseId is not required.

4. Action Items
 - [msft] Put webhdfs extension issue into a separate JIRA so folks from the community can
comment.  Do they prefer hadoop-azure-datalake mixes packages, or relaxing some method privacy,
or suggest other approach? - Raised HDFS-9938
 - [msft] volatile not needed in addition to synchronized in BatchByteArrayInputStream - AP:
 - [msft] Add to documentation: caveat for v1 where you can only have one WebHDFS (ADL or
vanilla) with Oauth2 not both. - AP: Vishwajeet
 - [cloudera] Go over latest patches.
 - [cloudera] Reach out to other hadoop committers to see what else needs addressing before
we can get committed.
 - [msft/cloudera] Start document on adl:// semantics, deltas versus HDFS, w/ and w/o FileStatusCache

5. Follow Up Topics (homework / next meeting)
- Follow up on append().  No leaseid.  What is delta from HDFS semantics.
- BufferManager purpose, coherency
- For readahead, so multiple FSInputStreams can see the same buffer that was fetched with
- Follow up on flushAsync() in write path (why / how)

6. Future plan of ADL client implementation
 - Share with community about future plans
 - Versioning

> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch,
HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch, HADOOP-12666-008.patch,
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft Azure Data
Lake Store (ADL) from within Hadoop. This would enable existing Hadoop applications such has
MR, HIVE, Hbase etc..,  to use ADL store as input or output.
> ADL is ultra-high capacity, Optimized for massive throughput with rich management and
security features. More details available at https://azure.microsoft.com/en-us/services/data-lake-store/

This message was sent by Atlassian JIRA

View raw message