Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8977F10306 for ; Mon, 17 Nov 2014 22:39:37 +0000 (UTC) Received: (qmail 11177 invoked by uid 500); 17 Nov 2014 22:39:34 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 11106 invoked by uid 500); 17 Nov 2014 22:39:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 11094 invoked by uid 500); 17 Nov 2014 22:39:34 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 11091 invoked by uid 99); 17 Nov 2014 22:39:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Nov 2014 22:39:34 +0000 Date: Mon, 17 Nov 2014 22:39:34 +0000 (UTC) From: "Brock Noland (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-8065?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14215= 325#comment-14215325 ]=20 Brock Noland commented on HIVE-8065: ------------------------------------ FYI that I committed the initial work. > Support HDFS encryption functionality on Hive > --------------------------------------------- > > Key: HIVE-8065 > URL: https://issues.apache.org/jira/browse/HIVE-8065 > Project: Hive > Issue Type: Improvement > Affects Versions: 0.13.1 > Reporter: Sergio Pe=C3=B1a > Assignee: Sergio Pe=C3=B1a > > The new encryption support on HDFS makes Hive incompatible and unusable w= hen this feature is used. > HDFS encryption is designed so that an user can configure different encry= ption zones (or directories) for multi-tenant environments. An encryption z= one has an exclusive encryption key, such as AES-128 or AES-256. Because of= security compliance, the HDFS does not allow to move/rename files between = encryption zones. Renames are allowed only inside the same encryption zone.= A copy is allowed between encryption zones. > See HDFS-6134 for more details about HDFS encryption design. > Hive currently uses a scratch directory (like /tmp/$user/$random). This s= cratch directory is used for the output of intermediate data (between MR jo= bs) and for the final output of the hive query which is later moved to the = table directory location. > If Hive tables are in different encryption zones than the scratch directo= ry, then Hive won't be able to renames those files/directories, and it will= make Hive unusable. > To handle this problem, we can change the scratch directory of the query/= statement to be inside the same encryption zone of the table directory loca= tion. This way, the renaming process will be successful.=20 > Also, for statements that move files between encryption zones (i.e. LOAD = DATA), a copy may be executed instead of a rename. This will cause an overh= ead when copying large data files, but it won't break the encryption on Hiv= e. > Another security thing to consider is when using joins selects. If Hive j= oins different tables with different encryption key strengths, then the res= ults of the select might break the security compliance of the tables. Let's= say two tables with 128 bits and 256 bits encryption are joined, then the = temporary results might be stored in the 128 bits encryption zone. This wil= l conflict with the table encrypted with 256 bits temporary. > To fix this, Hive should be able to select the scratch directory that is = more secured/encrypted in order to save the intermediate data temporary wit= h no compliance issues. > For instance: > {noformat} > SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id =3D=3D t2.= id; > {noformat} > - This should use a scratch directory (or staging directory) inside the t= able-aes256 table location. > {noformat} > INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; > {noformat} > - This should use a scratch directory inside the table-aes1 location. > {noformat} > FROM table-unencrypted > INSERT OVERWRITE TABLE table-aes128 SELECT id, name > INSERT OVERWRITE TABLE table-aes256 SELECT id, name > {noformat} > - This should use a scratch directory on each of the tables locations. > - The first SELECT will have its scratch directory on table-aes128 direct= ory. > - The second SELECT will have its scratch directory on table-aes256 direc= tory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)