Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B7210200B77 for ; Sat, 20 Aug 2016 00:33:37 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B5825160AAC; Fri, 19 Aug 2016 22:33:37 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D35CF160AAB for ; Sat, 20 Aug 2016 00:33:36 +0200 (CEST) Received: (qmail 38614 invoked by uid 500); 19 Aug 2016 22:33:36 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 38603 invoked by uid 99); 19 Aug 2016 22:33:35 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Aug 2016 22:33:35 +0000 Received: from mail-qk0-f179.google.com (mail-qk0-f179.google.com [209.85.220.179]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 99C441A0044 for ; Fri, 19 Aug 2016 22:33:35 +0000 (UTC) Received: by mail-qk0-f179.google.com with SMTP id l2so53431793qkf.3 for ; Fri, 19 Aug 2016 15:33:35 -0700 (PDT) X-Gm-Message-State: AEkoousl9WF0ZGmB32xfLBd0sG59jOVH6qherQSwwnDtesAz0KXJBSMhCfLk1gnft0ozilJVlYOsUGCAIRFg3Q== X-Received: by 10.55.72.2 with SMTP id v2mr10816679qka.68.1471646013475; Fri, 19 Aug 2016 15:33:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Christopher Date: Fri, 19 Aug 2016 22:33:22 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: BatchWriter Auth/Audit Question To: Accumulo Dev List Content-Type: multipart/alternative; boundary=001a114a86eede54be053a7449b8 archived-at: Fri, 19 Aug 2016 22:33:37 -0000 --001a114a86eede54be053a7449b8 Content-Type: text/plain; charset=UTF-8 It may be the case that removing authenticateUser call is fine from a correctness perspective, but it's relying on a very tenuous fact that that the UpdateSession is initialized with a null KeyExtent for the currentTablet. The authentication only happens in the applyUpdate method is called with a KeyExtent for a new table (different from the last update). That reliance on an initial null value, and serial updates to "current table" field, in order to detect the need to authenticate seems particularly sensitive to implementation changes on seemingly unrelated code, as well as future concurrency changes. And, that's in addition to the fact that an unauthenticated user is able to create numerous sessions. If we are willing to tolerate the risk of the bogus sessions issue (probably still needs to be debated), I'd like to see some more explicit changes to the way we verify that authentication is required, than the reliance on the null, etc., so we can have some future-proofing in here. For example, an explicit "sessionHasBeenAuthenticated" flag, rather than a passive detection of a non-null value having been changed, and maybe a comment to instruct developers to avoid concurrent execution of the code which does that check without additional verification of correctness. On Fri, Aug 19, 2016 at 6:10 PM Mike Drob wrote: > I got a bit confused here, so I hopped on IRC and tried to reason this out > with Christopher. > > The highlights are thus: > * The fail fast relies on canPerformSystemActions->authenticate->throw > Exception when it is not a valid user. > * If we want to switch the order attributes are checked in the > canAskAboutUser method we can probably do !(( selfusercheck && > authenticate) || canPerformSystemActions), but I don't expect huge > performance gains on this. > ** I just tried this without the new authenticate call and this gained me > somewhere around 3-5%. Probably not worth it since there are cases where > the authentication might be important. > * Removing the authenticateUser call is fine from a correctness perspective > because the user is still authenticated and checked via canWrite in either > update() or applyUpdates(). However, this might open us up to a malicious > user sending carefully crafted thrift RPCs to create a lot of bogus > sessions. > > I suspect that the performance gains in my case were based on skipping an > extra ZK round trip, but don't have any evidence for it. This was an > interesting diversion, but I don't think that there is much to follow up > on. > > Specifically, I haven't tried any of this with Kerberos, which I think will > be a much more relevant case. > > Mike > > On Fri, Aug 19, 2016 at 3:44 PM, Christopher wrote: > > > Sorry, I meant, that it wouldn't be executed in the write path if you > > switch the order. The two credentials should always be the same in that > > case. > > > > On Fri, Aug 19, 2016 at 4:37 PM Christopher wrote: > > > > > Correct. That code is not executed in the write path. It should only be > > > executed for the APIs where a user (typically an admin) is explicitly > > > checking another user's permissions/authorizations/etc. > > > > > > On Fri, Aug 19, 2016 at 3:59 PM Mike Drob wrote: > > > > > >> I tried out what Marc suggested, and after I removed the > > authenticateUser > > >> line from my tservers and my write throughput DOUBLED. This does not > > seem > > >> like a minor detail here. > > >> > > >> In the tserver logs, I still see a bunch of "2016-08-19 12:28:11,621 > > >> [Audit ] INFO : operation: permitted; user: mdrob; client: > > [host:port]; > > >> action: authenticate;" which I think corresponds to the authenticate > > >> checks > > >> from canWrite, so it's not like any user is allowed to do just write > > >> random > > >> data. > > >> A well-behaved client would fail while setting up the Connector > object. > > A > > >> malicious client that's trying to go directly via RPC would fail at > the > > >> canWrite check. > > >> > > >> I missed the part about canPerformSystemActions throwing an exception > > via > > >> authenticate when it fails, so we'd never get to the self-equality > > check. > > >> That's a good point, but I don't think that code is ever executed > during > > >> the write path. > > >> > > >> > > >> Mike > > >> > > >> > > >> > > >> On Fri, Aug 19, 2016 at 2:42 PM, Christopher > > wrote: > > >> > > >> > If you note the comment in the parent class, the implementation of > > >> > canAskAboutUser is relying on the authentication being done in the > > call > > >> to > > >> > canPerformSystemActions. If you reverse the order here, you lose > that > > >> > authentication check, and the action will be allowed simply if the > > user > > >> is > > >> > equal to itself. That's no good. We need an authentication check, > and > > we > > >> > can't rely on the client code to check it for us. > > >> > > > >> > We could reverse the order, but with an explicit authentication > check > > >> when > > >> > the user is asking about itself, but then there's likely going to > be a > > >> > duplicate authentication check if that fails, because > > >> > canPerformSystemActions will also need to do an authentication check > > >> > (because it can also be called directly). > > >> > > > >> > We can probably refactor this a bit, so we have an unauthenticated > > >> > "canPerformSystemActions" check which is shared by the authenticated > > >> > version and the reverse-ordered canAskAboutUser implementation, but > we > > >> need > > >> > to be careful about reversing the order here and forgetting to do an > > >> > authentication check, or unnecessarily doubling-up on the > > authentication > > >> > check (though, maybe the double-check isn't that bad in the less > > common > > >> > case... depends on impl performance). > > >> > > > >> > On Fri, Aug 19, 2016 at 2:33 PM Marc P. > > wrote: > > >> > > > >> > > Sorry....by out of band, It could be done once per writer or even > in > > >> the > > >> > > connector once...but I think we're in agreement Drob, Mike. > > >> > > > > >> > > On Fri, Aug 19, 2016 at 2:30 PM, Marc P. > > >> wrote: > > >> > > > > >> > > > Funny, exactly the same thing I mentioned to Josh last night. > Are > > >> you > > >> > > > watching us? > > >> > > > > > >> > > > In something I"m doing now I'm making the thrift calls and > > modeled > > >> it > > >> > > > after what the client code does; however, > > >> > > > once I removed the authenticateUser my throughput increased by > 25 > > >> per > > >> > > > cent. In my trace table it was by and large the greatest audited > > >> call. > > >> > > > > > >> > > > I think we could change how that works and potentially keep that > > >> out of > > >> > > > band. The fail fast is great if the user doesn't have > permissions > > to > > >> > that > > >> > > > table and may save the server some work...but as it is now, I'm > > >> always > > >> > > > authenticated and I'm causing more work for the server. Perhaps > we > > >> let > > >> > > the > > >> > > > natural exception that's thrown stop the client ( if it's being > a > > >> good > > >> > > > client ). I think the 25 per cent improvement was well worth the > > >> > > > alternative risk. > > >> > > > > > >> > > > On Fri, Aug 19, 2016 at 2:16 PM, Mike Drob > > >> wrote: > > >> > > > > > >> > > >> Devs, > > >> > > >> > > >> > > >> I was recently running a 1.7.2 cluster under a heavy write > > workload > > >> > and > > >> > > >> noticed a _lot_ of audit messages showing up in the logs. After > > >> some > > >> > > >> digging, I found that one of the causes is the following call > > >> chain: > > >> > > >> > > >> > > >> TabletServerBatchWriter::sendMutationsToTabletServer > > >> > > >> calls TabletServer.ThriftClientHandler::startUpdate > > >> > > >> calls SecurityOperation::authenticateUser <-- always return > true > > >> > > >> calls SecurityOperation::canAskAboutUser <-- always return true > > >> > > >> calls AuditedSecurityOperation::canPerformSystemActions <-- > > fails; > > >> > logs > > >> > > >> "action: performSystemAction" > > >> > > >> calls AuditedSecurityAction::authenticate <-- succeeds; logs > > >> "action: > > >> > > >> authenticate;" > > >> > > >> > > >> > > >> There are two separate but related problems here, I think, > that I > > >> want > > >> > > to > > >> > > >> confirm. > > >> > > >> > > >> > > >> When startUpdate calls security.authenticateUser(credentials, > > >> > > >> credentials); > > >> > > >> there is the comment that this is done to "// Make sure user is > > >> real" > > >> > > >> presumably to fail fast. This makes sense, but does it actually > > >> work? > > >> > In > > >> > > >> authenticateUser and canAskAboutUser we end up comparing the > two > > >> > > >> credentials to each other and this will always succeed because > > >> they're > > >> > > the > > >> > > >> same object. I expect the write fails later in the TabletServer > > >> code, > > >> > > >> possibly in the UpdateSession, but I haven't been able to > > untangle > > >> > this > > >> > > >> part yet. > > >> > > >> > > >> > > >> Second, I think it makes sense to switch the order of the OR in > > >> > > >> 'canAskAboutUser.' If this is in the write path, then it seems > > like > > >> > the > > >> > > >> most common case would be users asking about themselves, so we > > >> should > > >> > > >> check > > >> > > >> that before checking if a user is an admin. This also has the > > >> > advantage > > >> > > of > > >> > > >> removing two uninformative audit log lines. We won't have an > > audit > > >> > > message > > >> > > >> that somebody wrote data, but we don't have that currently > > anyway. > > >> > > >> > > >> > > >> I did some visual spot checking and this looks like an issue in > > >> master > > >> > > >> also, but I haven't run that code to find out. > > >> > > >> > > >> > > >> Thoughts? > > >> > > >> > > >> > > >> Mike > > >> > > >> > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > --001a114a86eede54be053a7449b8--