hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5207) Support data encryption for Hive tables
Date Tue, 08 Oct 2013 16:32:43 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789356#comment-13789356
] 

Owen O'Malley commented on HIVE-5207:
-------------------------------------

This patch won't compile, because Hive has to work when used with Hadoop 1.x. The shims are
used to support multiple versions of Hadoop (Hadoop 0.20, Hadoop 1.x, Hadoop 0.23, Hadoop
2.x) depending on what is install on the host system.

Furthermore, this seems likes the wrong direction. What is the advantage of this rather large
patch over using the cfs work? If the user defines a table in cfs all of the table's data
will be encrypted.

> Support data encryption for Hive tables
> ---------------------------------------
>
>                 Key: HIVE-5207
>                 URL: https://issues.apache.org/jira/browse/HIVE-5207
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.12.0
>            Reporter: Jerry Chen
>              Labels: Rhino
>         Attachments: HIVE-5207.patch, HIVE-5207.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> For sensitive and legally protected data such as personal information, it is a common
practice that the data is stored encrypted in the file system. To enable Hive with the ability
to store and query the encrypted data is very crucial for Hive data analysis in enterprise.

>  
> When creating table, user can specify whether a table is an encrypted table or not by
specify a property in TBLPROPERTIES. Once an encrypted table is created, query on the encrypted
table is transparent as long as the corresponding key management facilities are set in the
running environment of query. We can use hadoop crypto provided by HADOOP-9331 for underlying
data encryption and decryption. 
>  
> As to key management, we would support several common key management use cases. First,
the table key (data key) can be stored in the Hive metastore associated with the table in
properties. The table key can be explicit specified or auto generated and will be encrypted
with a master key. There are cases that the data being processed is generated by other applications,
we need to support externally managed or imported table keys. Also, the data generated by
Hive may be consumed by other applications in the system. We need to a tool or command for
exporting the table key to a java keystore for using externally.
>  
> To handle versions of Hadoop that do not have crypto support, we can avoid compilation
problems by segregating crypto API usage into separate files (shims) to be included only if
a flag is defined on the Ant command line (something like –Dcrypto=true).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message