Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6E8A1173B6 for ; Wed, 28 Jan 2015 05:26:34 +0000 (UTC) Received: (qmail 76138 invoked by uid 500); 28 Jan 2015 05:26:34 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 76097 invoked by uid 500); 28 Jan 2015 05:26:34 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 76085 invoked by uid 99); 28 Jan 2015 05:26:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jan 2015 05:26:34 +0000 Date: Wed, 28 Jan 2015 05:26:34 +0000 (UTC) From: "Christopher Tubbs (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=3Dcom.atlass= ian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1= 4294730#comment-14294730 ]=20 Christopher Tubbs commented on ACCUMULO-3513: --------------------------------------------- bq. I'm not sure how we can make any reliable security model if we operate = under the assumption that YARN is insecure. We have to trust that the YARN = task was correctly authenticated. Right, we have to authenticate both YARN *and* the end user. Even if YARN d= oesn't work this way, and it uses some delegation token instead of any iden= tifying information about itself, Accumulo's implementation requires a Kerb= eros token at the transport layer. You can't just omit a Kerberos token and= replace it with a delegation token in Accumulo's implementation (nor do I = think it'd be a good idea to try, because I do think we need to authenticat= e the middle-man, in this case YARN). bq. Again. We have to assume YARN is doing the right thing. No, we absolutely do not have to make any such assumption. We can validate = that by only whitelisting approved, trusted intermediaries. This is no diff= erent than X.509 extensions that designate permitted uses on certificates. = The fact that a certificate was signed by the same CA, does not automatical= ly make it appropriate to use to sign executable code, or to encrypt email.= The only thing is, Kerberos does not have any such mechanism built-in, lik= e X.509 certificate extensions, so whitelist is the only option. bq. The code running inside a YARN task is untrusted (unless you restrict j= ob submission and vet the users externally =E2=80=93 hit the users with a s= tick and tell them to behave). We should not be trusting this code to act a= s the user that it should. That's just my point... you don't know what is going on inside the YARN sys= tem. For all you know, there is a job accessing the local disk or system me= mory, searching for other client's credentials, and using them to connect t= o Accumulo. Just because YARN tries to connect using some client's credenti= als, it doesn't mean it's a valid use (granted, that takes effort). You've = got to actually lock down your YARN instance vet the infrastructure and the= code it runs before you can be sure that the credentials a job in YARN use= s to try to connect to Accumulo with are for a legitimate purpose. But, onc= e this is done, the precise degree to which the additional security offered= by the delegation token (due to expirable attributes, for instance) is deb= atable... but I concede that it is at least marginally better than without,= so we can move past that point if you like. If it has the ability to expir= e, I'm in favor. bq. The shared secret is acting in place of the kerberos credentials becaus= e there is no credentials available for use. ... I'm not so sure that's true. There's no credentials that represent the end = user, which are available to use, but the YARN process itself should have s= ome Kerberos identity, shouldn't it? I've read that paper, but and the quot= ed portion, but I had assumed (perhaps incorrectly) that the YARN process w= ould use its own Kerberos credentials to set up the transport layer, over w= hich it sends the delegation token for additional validation and authorizat= ion. I assumed the wording about it using a delegation token in place of a = Kerberos token was just shorthand for something a bit more complicated. Oth= erwise, what network protocol is it using that supports both Kerberos and a= delegation token? Even if HDFS/YARN is using some custom protocol which su= pports both (or two RPC endpoints), Accumulo's SASL implementation certainl= y is not... it needs *some* Kerberos credentials to set up the transport la= yer, before we can send any delegation token or whatever across. > Ensure MapReduce functionality with Kerberos enabled > ---------------------------------------------------- > > Key: ACCUMULO-3513 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3513 > Project: Accumulo > Issue Type: Bug > Components: client > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Blocker > Fix For: 1.7.0 > > > I talked to [~devaraj] today about MapReduce support running on secure Ha= doop to help get a picture about what extra might be needed to make this wo= rk. > Generally, in Hadoop and HBase, the client must have valid credentials to= submit a job, then the notion of delegation tokens is used by for further = communication since the servers do not have access to the client's sensitiv= e information. A centralized service manages creation of a delegation token= which is a record which contains certain information (such as the submitti= ng user name) necessary to securely identify the holder of the delegation t= oken. > The general idea is that we would need to build support into the master t= o manage delegation tokens to node managers to acquire and use to run jobs.= Hadoop and HBase both contain code which implements this general idea, but= we will need to apply them Accumulo and verify that it is M/R jobs still w= ork on a kerberized environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)