hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-16119) KMS on Hadoop RPC Engine
Date Wed, 06 Mar 2019 15:14:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785731#comment-16785731

Wei-Chiu Chuang commented on HADOOP-16119:

Hi [~hexiaoqiao] really appreciate your insights!

Regarding delegation tokens – delegation tokens are stored in zookeeper, and after HADOOP-14445,
delegation tokens are shared among KMS instances.

Key store consistency – I am not sure how others use KMS. But within CDH, we have a plugin
that directs the requests to a backend server "Cloudera KeyTrustee Server". Essentially, KMS
serves as a proxy for the backend. Therefore consistency is guaranteed.

Cloudera KeyTrustee Server is currently a proprietary component. But it sounds like Cloudera
will eventually become "100% open source", so that's an option for you. I think your proposal
makes sense. I am just not sure how much work will it require. At Cloudera there is a team
dedicated to Cloudera KeyTrustee Server development, so I imagine it's a non-trivial amount
of work.

IMHO, I am looking forward to a good persistent+consistent key store too, if we can come
up with a good design. In fact, I am concerned about CKTS performance under the said load.


[~anu] [~xyao] does the Sentry KMS provide a persistent+consistent key store by any chance?


Adding/removing a KMS instance requires client side change, that is correct. Currently that
requires a cluster-wide rolling restart. I imagine we could use NameNode's FsServerDefaults
to update that dynamically.


I am not clear about the HA argument. In the current design, a KMS connection is not "sticky",
meaning that regardless of the KMS status, _each KMS request_ would have an equal probability
to attempt to reach a dead KMS. Is that what you meant?

> KMS on Hadoop RPC Engine
> ------------------------
>                 Key: HADOOP-16119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16119
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Jonathan Eagles
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: Design doc_ KMS v2.pdf
> Per discussion on common-dev and text copied here for ease of reference.
> https://lists.apache.org/thread.html/0e2eeaf07b013f17fad6d362393f53d52041828feec53dcddff04808@%3Ccommon-dev.hadoop.apache.org%3E
> {noformat}
> Thanks all for the inputs,
> To offer additional information (while Daryn is working on his stuff),
> optimizing RPC encryption opens up another possibility: migrating KMS
> service to use Hadoop RPC.
> Today's KMS uses HTTPS + REST API, much like webhdfs. It has very
> undesirable performance (a few thousand ops per second) compared to
> NameNode. Unfortunately for each NameNode namespace operation you also need
> to access KMS too.
> Migrating KMS to Hadoop RPC greatly improves its performance (if
> implemented correctly), and RPC encryption would be a prerequisite. So
> please keep that in mind when discussing the Hadoop RPC encryption
> improvements. Cloudera is very interested to help with the Hadoop RPC
> encryption project because a lot of our customers are using at-rest
> encryption, and some of them are starting to hit KMS performance limit.
> This whole "migrating KMS to Hadoop RPC" was Daryn's idea. I heard this
> idea in the meetup and I am very thrilled to see this happening because it
> is a real issue bothering some of our customers, and I suspect it is the
> right solution to address this tech debt.
> {noformat}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message