hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10741) A lightweight WebHDFS client library
Date Tue, 24 Jun 2014 15:46:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042265#comment-14042265

Alejandro Abdelnur commented on HADOOP-10741:

[~kamrul], by hadoop-core I assume you mean hadoop-common. I would say that the right way
of doing this would to have a hadoop-common-api which contains ONLY the hadoop public API
and its required dependencies plus the client filesystem implementations shipped with Hadopp
(local, hdfs, har, webhdfs, hftp, ...). The problem with doing this is that requires a significant
shuffling of code around and masive refactoring of testcases; I believe this is why it was
never done. The hadoop-client POM is the best we have at the moment to have a hadoop client
as light weight as possible. You could create a hadoop-webhdfs-client module which excludes
yarn & mapreduce from hadoop-client making it lighter. Still with WebHdfs you have the
problem that it currently lives in hadoop-hdfs, if you move WebHdfsFileSystem to hadoop-common
you could also  exclude hadoop-hdfs  from hadoop-webhdfs-client module making it lighter.
I would rather go along these lines rather than writing new code in hadoop to duplicating
functionality to avoid using hadoop code.

> A lightweight WebHDFS client library
> ------------------------------------
>                 Key: HADOOP-10741
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10741
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: tools
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Mohammad Kamrul Islam
> One of the motivations for creating WebHDFS is for applications connecting to HDFS from
outside the cluster.  In order to do so, users have to either
> # install Hadoop and use WebHdfsFileSsytem, or
> # develop their own client using the WebHDFS REST API.
> For #1, it is very difficult to manage and unnecessarily complicated for other applications
since Hadoop is not a lightweight library.  For #2, it is not easy to deal with security and
handle transient errors.
> Therefore, we propose adding a lightweight WebHDFS client as a separated library which
does not depend on Common and HDFS.  The client can be packaged as a standalone jar.  Other
applications simply add the jar to their classpath for using it.

This message was sent by Atlassian JIRA

View raw message