hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <>
Subject [jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
Date Wed, 06 May 2015 18:01:01 GMT


Thejas M Nair commented on HIVE-8065:

Thinking more about above insert case, the performance tradeoff is not necessary. The files
written to hdfs before move can still be written to EZ1 without any reduction in data protection,
as  it would contain data matching the final results. It is the intermediate data before that
that can contain sensitive data (in case of MR mode).

In case of "select * from tableEZ2 inner join tableEZ3" , my understanding is it uses one
of EZ2 or EZ3 for scratch dir. This creates two issues- 
 # Write permissions are now required to read from these tables.
 # Sensitive data from one zone will be stored in another. 

> Support HDFS encryption functionality on Hive
> ---------------------------------------------
>                 Key: HIVE-8065
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.13.1
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>              Labels: Hive-Scrum
> The new encryption support on HDFS makes Hive incompatible and unusable when this feature
is used.
> HDFS encryption is designed so that an user can configure different encryption zones
(or directories) for multi-tenant environments. An encryption zone has an exclusive encryption
key, such as AES-128 or AES-256. Because of security compliance, the HDFS does not allow to
move/rename files between encryption zones. Renames are allowed only inside the same encryption
zone. A copy is allowed between encryption zones.
> See HDFS-6134 for more details about HDFS encryption design.
> Hive currently uses a scratch directory (like /tmp/$user/$random). This scratch directory
is used for the output of intermediate data (between MR jobs) and for the final output of
the hive query which is later moved to the table directory location.
> If Hive tables are in different encryption zones than the scratch directory, then Hive
won't be able to renames those files/directories, and it will make Hive unusable.
> To handle this problem, we can change the scratch directory of the query/statement to
be inside the same encryption zone of the table directory location. This way, the renaming
process will be successful. 
> Also, for statements that move files between encryption zones (i.e. LOAD DATA), a copy
may be executed instead of a rename. This will cause an overhead when copying large data files,
but it won't break the encryption on Hive.
> Another security thing to consider is when using joins selects. If Hive joins different
tables with different encryption key strengths, then the results of the select might break
the security compliance of the tables. Let's say two tables with 128 bits and 256 bits encryption
are joined, then the temporary results might be stored in the 128 bits encryption zone. This
will conflict with the table encrypted with 256 bits temporary.
> To fix this, Hive should be able to select the scratch directory that is more secured/encrypted
in order to save the intermediate data temporary with no compliance issues.
> For instance:
> {noformat}
> SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE ==;
> {noformat}
> - This should use a scratch directory (or staging directory) inside the table-aes256
table location.
> {noformat}
> INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
> {noformat}
> - This should use a scratch directory inside the table-aes1 location.
> {noformat}
> FROM table-unencrypted
> INSERT OVERWRITE TABLE table-aes128 SELECT id, name
> INSERT OVERWRITE TABLE table-aes256 SELECT id, name
> {noformat}
> - This should use a scratch directory on each of the tables locations.
> - The first SELECT will have its scratch directory on table-aes128 directory.
> - The second SELECT will have its scratch directory on table-aes256 directory.

This message was sent by Atlassian JIRA

View raw message