hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-2742) Provide strong authentication with a secure RPC engine
Date Fri, 18 Nov 2011 01:16:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152558#comment-13152558

Gary Helmling commented on HBASE-2742:

Thanks for the review!  Please add additional comments to review board.  In a patch this big,
it's much easier for me to keep track of comments and replies there.
Looking at AuthenticationTokenSecretManager, it becomes leader upon entering run().
Periodically it calls rollCurrentKey() which checks whether it is still leader. I think rollCurrentKey()
should return a boolean value indicating leadership status so that the while loop can re-obtain
leadership if leadership is lost.

Leader election for AuthenticationTokenSecretManager is equivalent to active master management.
 In run(), all instances block on zkLeader.waitToBecomeLeader().  Only the leader returns.
 The leader remains active until stopped or the zk session is lost (which should cause an
abort), so I don't see where we would need to recheck in this case?

In my opinion, failure to remove expired key or to add new key should be treated with more
addKeyToZK() and updateKeyInZK() throws RuntimeException, alleviating this concern a little.
But zkWatcher.removeKeyFromZK() swallows KeeperException, making things complicated. We may
need to call ZKUtil.deleteNode() directly.

Agree, removeKeyFromZK() should be aborting on KeeperException.  That was missing in other
places as well.  Cleaning up to call watch.abort() on KeeperException, as well as replacing
the RuntimeExceptions with watcher.abort().

I am very hesitant, though, to call ZKUtil.deleteNode() in the case of IOExceptions (which
only occur from the Writable serialization).  In the case of say an upgrade with changed serialization
format or a znode that somehow got corrupted data, this seems appropriate.  But I'm more worried
about what would occur in the case of a massive misconfiguration, where {{zookeeper.znode.tokenauth.parent}}
is pointing to the wrong path, and what the consequences might be from deleting every znode
it can't deserialize.  I'd rather abort the server in this case and let the end user deal
with it.

Some ZKWatcher methods shouldn't even have been throwing IOExceptions.  Those have been fixed.
 The remaining cases (nodeDataChanged(), refreshNodes(), addKeyToZK(), updatedKeyInZK()) have
been changed to abort on IOException since it would likely mean a programming error.

In src/main/resources/hbase-default.xml, I think we should use dot instead of dash for parameter
key names:
  hbase.auth.key.update-interval should be hbase.auth.key.update.interval
  hbase.auth.token.max-lifetime should be hbase.auth.token.max.lifetime

These keys were named to match the keys used for HDFS delegation tokens:

But, since we are consistent about not using '-' in HBase config I will replace.

In ZKLeaderManager.stepDownAsLeader():

ZKUtil.deleteNodeFailSilent(watcher, leaderZNode);

I think we should call ZKUtil.deleteNode() instead and handle KeeperException gracefully because
waitToBecomeLeader() would only delete zk node which has the other candidate's Id.

I don't really see what ZKUtil.deleteNode() would improve in this context?  If we instead
got the KeeperException.NoNodeException that it exposes, what would we want to do differently?
 Since the point of stepDownAsLeader() is to step down, my opinion would be to ignore the
NoNodeException if the znode has somehow been removed.  The logic in waitToBecomeLeader()
is completely separate and clones the logic in ActiveMasterManager.blockUntilBecomingActiveMaster().

In HBaseServer.closeCurrentConnection(), the following is added:

c = null;

Since c is the attachment to SelectionKey, should we call key.attach(null) ?

Good point.  The {{c = null}} assignment is useless.  Changing to {{key.attach(null)}}.

HBaseRPC.call() is marked deprecated. Probably we should mention SecureRpcEngine so that developers
know what to look for.

Added @depecated tag pointing to RpcEngine.call().
> Provide strong authentication with a secure RPC engine
> ------------------------------------------------------
>                 Key: HBASE-2742
>                 URL: https://issues.apache.org/jira/browse/HBASE-2742
>             Project: HBase
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Gary Helmling
>            Priority: Critical
>             Fix For: 0.92.0
> The HBase RPC code (org.apache.hadoop.hbase.ipc.*) was originally forked off of Hadoop
RPC classes, with some performance tweaks added.  Those optimizations have come at a cost
in keeping up with Hadoop RPC changes however, both bug fixes and improvements/new features.
> In particular, this impacts how we implement security features in HBase (see HBASE-1697
and HBASE-2016).  The secure Hadoop implementation (HADOOP-4487) relies heavily on RPC changes
to support client authentication via kerberos and securing and mutual authentication of client/server
connections via SASL.  Making use of the built-in Hadoop RPC classes will gain us these pieces
for free in a secure HBase.
> So, I'm proposing that we drop the HBase forked version of RPC and convert to direct
use of Hadoop RPC, while working to contribute important fixes back upstream to Hadoop core.
 Based on a review of the HBase RPC changes, the key divergences seem to be:
> HBaseClient:
>  - added use of TCP keepalive (HBASE-1754)
>  - made connection retries and sleep configurable (HBASE-1815)
>  - prevent NPE if socket == null due to creation failure (HBASE-2443)
> HBaseRPC:
>  - mapping of method names <-> codes (removed in HBASE-2219)
> HBaseServer:
>  - use of TCP keep alives (HBASE-1754)
>  - OOME in server does not trigger abort (HBASE-1198)
> HbaseObjectWritable:
>  - allows List<> serialization
>  - includes it's own class <-> code mapping (HBASE-328)
> Proposed process is:
> 1. open issues with patches on Hadoop core for important fixes/adjustments from HBase
RPC (HBASE-1198, HBASE-1815, HBASE-1754, HBASE-2443, plus a pluggable ObjectWritable implementation
in RPC.Invocation to allow use of HbaseObjectWritable).
> 2. ship a Hadoop version with RPC patches applied -- ideally we should avoid another
copy-n-paste code fork, subject to ability to isolate changes from impacting Hadoop internal
RPC wire formats
> 3. if all Hadoop core patches are applied we can drop back to a plain vanilla Hadoop
> I realize there are many different opinions on how to proceed with HBase RPC, so I'm
hoping this issue will kick off a discussion on what the best approach might be.  My own motivation
is maximizing re-use of the authentication and connection security work that's already gone
into Hadoop core.  I'll put together a set of patches around #1 and #2, but obviously we need
some consensus around this to move forward.  If I'm missing other differences between HBase
and Hadoop RPC, please list as well.  Discuss!

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message