hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8079) Proposal for enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments
Date Mon, 12 Mar 2012 12:03:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227445#comment-13227445
] 

Steve Loughran commented on HADOOP-8079:
----------------------------------------

Regarding the {{org.apache.hadoop.fs.azurenative}} classes

* keys like {{"fs.azure.buffer.dir"}} need to be pulled out and made constants; the embedding
of strings is something the main codebase is slowly moving away from. Some of the code does
this, but not all.
* The code depends on microsoft-windowsazure-api 1.2.0 , which is in the maven repository.
There's also a 0.2.0 version in there -any particular reason for not using the latest release?
* Testing? How is anyone working with this code going to use the fs. Is there S3-style remote
access, or do you have to bring up a VM in the cluster?
* The catch of {{Exception}} and wrapping with {{AzureException}} is best set up so that {{IOException}}
exceptions aren't caught and wrapped, as they match the signature. I don't know if the native
API throws these, but adding an extra layer of nesting never helps with troubleshooting live
systems.

It may be cleaner to keep the azure FS source tree outside the main hadoop code, and host
it in a parallel hadoop-azurefs project with the extra dependency, and the extra output artifacts.
Anyone who added a mvn or ivy dependency on hadoop-azurefs would get the -api JAR, and testing
could be isolated. This could also be a good opportunity to do the same for KFS, which is
under-tested in the current release process, and for any other DFS clients that people want
in the codebase. Maybe the policy should be: if it is testable by anyone, put it in the hadoop
source tree, but if not, the FS vendor has to do it. (I'm thinking of things like GPFS here
and others, not just AzureFS)
                
> Proposal for enhancements to Hadoop for Windows Server and Windows Azure development
and runtime environments
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 1.0.0
>            Reporter: Alexander Stojanovic
>              Labels: hadoop
>         Attachments: azurenative.zip, general-utils-windows.patch, hadoop-8079.AzureBlobStore.patch,
hadoop-8079.patch, hadoopcmdscripts.zip, mapred-tasks.patch, microsoft-windowsazure-api-0.1.2.jar,
security.patch, windows-cmd-scripts.patch
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> This JIRA is intended to capture discussion around proposed work to enhance Apache Hadoop
to run well on Windows.  Apache Hadoop has worked on Microsoft Windows since its inception,
but Windows support has never been a priority. Currently Windows works as a development and
testing platform for Hadoop, but Hadoop is not natively integrated, full-featured or performance
and scalability tuned for Windows Server or Windows Azure.  We would like to change this and
engage in a dialog with the broader community on the architectural design points for making
Windows (enterprise and cloud) an excellent runtime and deployment environment for Hadoop.
 
>  
> The Isotope team at Microsoft (names below) has developed an Apache Hadoop 1.0 patch
set that addresses these performance, integration and feature gaps, allowing Apache Hadoop
to be used with Azure and Windows Server without recourse to virtualization technologies such
as Cygwin. We have significant interest in the deployment of Hadoop across many multi-tenant,
PaaS and IaaS environments - which bring their own unique requirements. 
> Microsoft has recently completed a CCLA with Apache and would like to contribute these
enhancements back to the Apache Hadoop community.
> In the interest of improving Apache Hadoop so that it runs more smoothly on all platforms,
including Windows, we propose first contributing this work to the Apache community by attaching
it to this JIRA.  From there we would like to work with the community to refine the patch
set until it is ready to be merged into the Apache trunk.
> Your feedback solicited,
>  
> Alexander Stojanovic
> Min Wei
> David Lao
> Lengning Liu
> David Zhang
> Asad Khan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message