hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4575) An independent HTTPS proxy for HDFS
Date Mon, 03 Nov 2008 22:54:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644843#action_12644843

Chris Douglas commented on HADOOP-4575:

On core changes:
* There is a race condition in UserGroupInformationManager::getUgiForUser; the critical section
protects the get and put, but simultaneous requests for an absent ugi will likely create duplicate
entries and insert the last acquirer of the latter lock. o.a.h.mapred.IndexCache uses a standard
idiom that applies here. Alternatively, simply making the method synchronized would perform
adequately and avoid considerable complication, particularly given the cleanup semantics for
this cache.
* A cleanup thread is overkill. It should be sufficient to verify the timestamp of the item
out of the cache and discard it if it's too old (refreshing the ugi in the cache). If the
size of the cache is a concern, setting a threshold triggering cleanup on insertion is a sound
strategy. The ugi+timestamp is almost certainly less than 100 bytes of memory, and closer
to half that in most cases. If the limit is only checked on insertion and entries aren't pushed
out of the cache, there's no risk for the degenerate case where the cache is scanned for each
access. Further, only valid users are inserted, so the DoS threat is low.
* Since there are no references to UserGroupInformationManager in core, perhaps it can live
with the proxy in contrib (including the changes to hadoop-default.xml)
* I don't understand the changes to FileDataServlet. It introduces a race condition in the
jspHelper field initialization, which should be fixed. Is this getting around a static initialization
order problem? If so, a proper pattern is preferred.
* UnixUserGroupInformation::getUgiForUser is a dangerous method:

  public static UnixUserGroupInformation getUgiForUser(String userName)
      throws IOException {
    String[] cmd = new String[] { "bash", "-c", "id -Gn " + userName };
    String[] groups = Shell.execCommand(cmd).split("\\s+");
    return new UnixUserGroupInformation(userName, groups);
Though the proxy does a fair amount of sanity checking and input sanitizing, future callers
may not be so careful (e.g. userName = {{"foo ; rm \-rf /"}}). Further, the login functionality
is more general and a public static method on UnixUGI will impede future design. As a separate
issue, replacing Shell::getGroupsCommand with "id -Gn" and the StringTokenizer with String::split
are sound improvements. For this issue, though I can see why this functionality is associated
with UnixUGI, looking up groups for arbitrary users belongs in the proxy code and is not required
in core. The method should also do some of its own sanity checks.
* HftpFileSystem.ran should be {{final}}

On contrib:
* Why does ProxyHttpServer not extend or own an HttpServer? There is considerable overlap
in their interfaces and implementations.
* I don't understand why the proxy would return a status of [402|http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.3]
when the client doesn't provide a cert. Is there a canonical response for this case?
* Given that authentication is bypassed for testing:
+    } else { // http request, set ugi for servlets, only for testing purposes
+      String ugi = rqst.getParameter("ugi");
+      rqst.setAttribute("authorized.ugi", new UnixUserGroupInformation(ugi
+          .split(",")));
+    }
if a non-ssl listener is attached, this should probably log at WARN level on init at least.
* Calling System.exit from createHdfsProxy is unnecessarily forceful; returning null for success
and throwing on failure is clearer. Similarly, the switch stmt is less readable than humble
if stmts.

> An independent HTTPS proxy for HDFS
> -----------------------------------
>                 Key: HADOOP-4575
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4575
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>         Attachments: 4575-1-contrib.patch, 4575-1-core.patch, 4575-1.patch, lib.tgz
> Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e.,
HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from
the cluster and serving data to users who otherwise can't access the cluster directly. This
proxy will authenticate its users via SSL certificates and implements the HSFTP interface
for reading files.  

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message