hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2858) MRv2 WebApp Security
Date Tue, 11 Oct 2011 17:11:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125198#comment-13125198

Robert Joseph Evans commented on MAPREDUCE-2858:

I am happy to help out how ever I can so long as this design is what the community decides
it really wants.  I still have strong objections to using a proxy to a different web server.
 I feel I have made my objections clear in previous comments and in comments on MAPREDUCE-2863
and I will leave it at that.  

I would however like some more detail about exactly how this proxy will behave.  Or perhaps
how it currently does behave as Luke has indicated he has working patches that are trying
to get their way through IBM legal.  We have a high level concept but the details are a bit
sketchy to me.  

Is the proxy going to try and rewrite URLs so that they always pass through the proxy or is
it simply going to rely on the application master to only output relative URLs?

How is the RM going to generate the AM URL from the URL that the AM returns?  i.e. What is
:am_uri in http://app-proxy1.cluster1.company.com:8181/yarn/:app_id/:am_uri?

How is the proxy going to pass the user name to the Application Master?

Is there any plans for VIP on the proxies for failover?  If the RM is inserting a proxy only
based off of the config, what happens if that proxy goes down?  We probably want to use a
VIP in front of the proxies and have the App Master verify it using InetAddress.getAllByName.
 If there is more then one proxy in the config is the RM going to ping the proxies on an ongoing
basis to be able to return a URL that is valid?

The white listing based off of crypto signatures seems very confusing to me, possibly slow/memory
intensive, and very much not user friendly.

 * Is the proxy going to download the entire contents of a URL to try to compute the checksum
of the javascript inside it before passing it on to the user? A malicious app master could
crash the proxy by sending huge amounts of data to it, unless we can spill it to disk at some
point, or set a maximum size limit on the amount of data that we cache.  
 * Is all this processing just so that the proxy can pop up a warning message saying this
page looks a bit odd?  I thought the point of having a user changeable API was so that the
user could modify it to make it fit their needs better.  Now if they change it in any significant
way, that involves javascript, every page a user views on this new application master they
will have to click through a warning message, or the proxy is going to have to store a cookie
or something saying this user has accepted the risks for this page/this app master (I really
don't know how the proxy can definitely say what the user has opted into).  
 * Are we also going to download the complete contents of all of the JS files that the HTML
points to?  We would have to if we really wanted the signature to be accurate, or else they
could hide something inside a JS file.
 * What about JSON/XML data or other static files are we going to do anything with it?
 * How are we going to generate these signatures at compile time?  All of the HTML pages are
dynamically generated.  Are we going to run a unit test like script and generate the signatures?
 What about for other non-mrv2 projects that might not want to use Hamlet, are we going to
bring up a web server and scrape the pages?

> MRv2 WebApp Security
> --------------------
>                 Key: MAPREDUCE-2858
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2858
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2, security
>    Affects Versions: 0.23.0
>            Reporter: Luke Lu
>            Assignee: Luke Lu
>            Priority: Blocker
>             Fix For: 0.23.0
> In MRv2, while the system servers (ResourceManager (RM), NodeManager (NM) and NameNode
(NN)) run as "trusted"
> system users, the application masters (AM) run as users who submit the application. While
this offers great flexibility
> to run multiple version of mapreduce frameworks (including their UI) on the same Hadoop
cluster, it has significant
> implication for the security of webapps (Please do not discuss company specific vulnerabilities
> Requirements:
> # Secure authentication for AM (for app/job level ACLs).
> # Webapp security should be optional via site configuration.
> # Support existing pluggable single sign on mechanisms.
> # Should not require per app/user configuration for deployment.
> # Should not require special site-wide DNS configuration for deployment.
> This the top jira for webapp security. A design doc/notes of threat-modeling and counter
measures will be posted on the wiki.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message