Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CBDD410E4F for ; Mon, 14 Oct 2013 03:29:52 +0000 (UTC) Received: (qmail 24098 invoked by uid 500); 14 Oct 2013 03:29:49 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 23803 invoked by uid 500); 14 Oct 2013 03:29:48 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 23364 invoked by uid 500); 14 Oct 2013 03:29:46 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 23352 invoked by uid 99); 14 Oct 2013 03:29:44 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Oct 2013 03:29:44 +0000 Date: Mon, 14 Oct 2013 03:29:44 +0000 (UTC) From: "Jerry Chen (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-5207) Support data encryption for Hive tables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5207?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13793= 897#comment-13793897 ]=20 Jerry Chen commented on HIVE-5207: ---------------------------------- Hi Larry, thanks for you pointing out the docs. Yes, we will complement mor= e javadocs and document as our next work. =20 {quote}1. TwoTieredKey - exactly the purpose, how it's used what the tiers = are, etc{quote} TwoTiredKey is used for the case that the table key is stored in the Hive m= etastore. The table key will be encrypted with the master key which is prov= ided externally. In this case, user maintains and manages only the master k= ey externally other than manages all the table keys externally. This is use= ful when there is no full-fledged key management system available. =20 {quote}2. External KeyManagement integration - where and what is the expect= ed contract for this integration{quote} To integrate with external key management system, we use the KeyProvider in= terface in HADOOP-9331. Implementation of KeyProvider interface for a speci= fied key management system can be set as KeyProvider for retrieving key. =20 {quote}3. A specific usecase description for exporting keys into an externa= l keystore and who has the authority to initiate the export and where the p= assword comes from{quote} Exporting of the internal keys comes with the Hive command line. As the int= ernal table keys were encrypted with the master key, when performing the ex= porting, the master key must be provided in the environment which is contro= lled by the user. If the master key is not available, the encrypted table = keys for exporting cannot be decrypted and thus cannot be exported. The Key= Provider implementation for retrieving master key can provide its own authe= ntication and authorization for deciding whether the current user has acces= s to a specific key. =20 {quote}4. An explanation as to why we should ever store the key with the da= ta which seems like a bad idea. I understand that it is encrypted with the = master secret - which takes me to the next question. {quote} Exactly speaking, it is not with the data. The table key is stored in the H= ive metastore. I see your points at this question. Just as mentioned, for u= se cases that there is no full-fledged and ready to use key management syst= em available, it is useful. We provide several alternatives for managing ke= ys. When creating an encrypted table, user can specify whether the key is m= anaged externally or internally. For externally managed keys, only the key = name (alias) will be stored in the Hive metastore and the key will be retri= eved through KeyProvider set in the configuration. =20 {quote}5. Where is the master secret established and stored and how is it p= rotected{quote} Currently, we assume that the user manages the master key. For example, for= simple uses cases, he can stores the master key in java KeyStore which pro= tected by a password and stores in the folder which is read-only for specif= ic user or groups. User can also stores the master key in other key managem= ent system as the master key is retrieved through KeyProvider. =20 Really appreciate your time reviewing this. Thanks > Support data encryption for Hive tables > --------------------------------------- > > Key: HIVE-5207 > URL: https://issues.apache.org/jira/browse/HIVE-5207 > Project: Hive > Issue Type: New Feature > Affects Versions: 0.12.0 > Reporter: Jerry Chen > Labels: Rhino > Attachments: HIVE-5207.patch, HIVE-5207.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > For sensitive and legally protected data such as personal information, it= is a common practice that the data is stored encrypted in the file system.= To enable Hive with the ability to store and query the encrypted data is v= ery crucial for Hive data analysis in enterprise.=20 > =20 > When creating table, user can specify whether a table is an encrypted tab= le or not by specify a property in TBLPROPERTIES. Once an encrypted table i= s created, query on the encrypted table is transparent as long as the corre= sponding key management facilities are set in the running environment of qu= ery. We can use hadoop crypto provided by HADOOP-9331 for underlying data e= ncryption and decryption.=20 > =20 > As to key management, we would support several common key management use = cases. First, the table key (data key) can be stored in the Hive metastore = associated with the table in properties. The table key can be explicit spec= ified or auto generated and will be encrypted with a master key. There are = cases that the data being processed is generated by other applications, we = need to support externally managed or imported table keys. Also, the data g= enerated by Hive may be consumed by other applications in the system. We ne= ed to a tool or command for exporting the table key to a java keystore for = using externally. > =20 > To handle versions of Hadoop that do not have crypto support, we can avoi= d compilation problems by segregating crypto API usage into separate files = (shims) to be included only if a flag is defined on the Ant command line (s= omething like =E2=80=93Dcrypto=3Dtrue). -- This message was sent by Atlassian JIRA (v6.1#6144)