hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2178) Contributing Hoop to HDFS, replacement for HDFS proxy with read/write capabilities
Date Mon, 17 Oct 2011 19:29:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129108#comment-13129108
] 

Alejandro Abdelnur commented on HDFS-2178:
------------------------------------------

*On Sanjay's create and append:*

You are correct, an HDFS proxy deployment does not need to do a redirection (to a DN); it
will be handled itself by the proxy.

Still, for authentication purposes a probing should be done before attempting uploading data.
Because of this the create & append requests are identical in the hdfs-proxy (hoop) and
in the built-in (NN&DN http serving) modes. In the case of hdfs-proxy the probing is for
auth only, in the case of built-in the probing is for both authentication and potential redirection.

This means that we can have the exact same API for both hdfs-proxy and built-in modes.

Still the use of 100-continue is an open issue, more of this at the end of this comment.

*On Sanjay's comment on 'some thoughts of webhdfs & hoop':*

 * Support for trusted proxies (doAs functionality) it does make sense in the case of hdfs-proxy
and it is already supported by Hoop. I.e. server-side apps that need/want HTTP access to HDFS
and act on behalf of other users. I.e. for somebody using the Java API to access HDFS via
hdfs-proxy and using a doAs block.

 * Support for delegation tokens to access hdfs-proxy it does make sense. I.e. when using
distcp via hdfs-proxy; in this case, delegation tokens should work across clusters (this may
not be supported today but IMO it should eventually work).

 * You meantion code/param/return clean up. What kind of clean up are you referring to?

*On Sanjay's 'As we move forward':*

 * What subset of webhdfs API makes sense for a proxy? IMO, they should be identical, a user
should not see a difference if they access a built-in or an hdfs-proxy HTTP setup.

 * Regarding a 'pure proxy'. This would be more like a reverse proxy and then all URLs would
have to be relative or resolved with knowledge of the reverse proxy. IMO, a hdfs-proxy on
its own has its merits.

*Open issues:*

 1* *Use of 100-CONTINUE for create & append*, it seems not all client HTTP libraries
handle this (JDK HttpURLConnection to start). Plus the servlet API does not provide support
for it, it seems some servlet containers handle it but in a way that it is non-standard (http://jira.codehaus.org/browse/JETTY-341)
or in a way that it never reaches the servlet (http://stackoverflow.com/questions/848378/sending-100-continue-using-java-servlet-api).
Because of this I'm inclined to use a handle request as shown in the attached API doc.

 2* *Are we OK with the attached API* (except for the discussion on #1)?

 3* *Codebase*, Hoop was using TestNG for testcases and non-apache package names, I've been
working on refactoring to work with JUnit, to refactor package names and to organize the code
in a way that fits in the current source layout. In the mean time, for webhdfs (built-in http)
some code from Hoop has been cloned, modified and integrated into HDFS. This code has changed
significantly, thus integrating it with Hoop will require some serious rewriting of Hoop.
Giving the current timeframe we are shooting for 0.23, should we add Hoop as a separate module
to have hdfs-proxy like support and later see how merge the code?

                
> Contributing Hoop to HDFS, replacement for HDFS proxy with read/write capabilities
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-2178
>                 URL: https://issues.apache.org/jira/browse/HDFS-2178
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.23.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 0.23.0
>
>         Attachments: HDFSoverHTTP-API.html, HdfsHttpAPI.pdf
>
>
> We'd like to contribute Hoop to Hadoop HDFS as a replacement (an improvement) for HDFS
Proxy.
> Hoop provides access to all Hadoop Distributed File System (HDFS) operations (read and
write) over HTTP/S.
> The Hoop server component is a REST HTTP gateway to HDFS supporting all file system operations.
It can be accessed using standard HTTP tools (i.e. curl and wget), HTTP libraries from different
programing languages (i.e. Perl, Java Script) as well as using the Hoop client. The Hoop server
component is a standard Java web-application and it has been implemented using Jersey (JAX-RS).
> The Hoop client component is an implementation of Hadoop FileSystem client that allows
using the familiar Hadoop filesystem API to access HDFS data through a Hoop server.
>   Repo: https://github.com/cloudera/hoop
>   Docs: http://cloudera.github.com/hoop
>   Blog: http://www.cloudera.com/blog/2011/07/hoop-hadoop-hdfs-over-http/
> Hoop is a Maven based project that depends on Hadoop HDFS and Alfredo (for Kerberos HTTP
SPNEGO authentication). 
> To make the integration easy, HDFS Mavenization (HDFS-2096) would have to be done first,
as well as the Alfredo contribution (HADOOP-7119).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message