hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xinli Shang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-21848) Table property name definition between ORC and Parquet encrytion
Date Fri, 07 Jun 2019 16:01:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xinli Shang updated HIVE-21848:
-------------------------------
    Description: 
The goal of this Jira is to define a superset of unified table property names that can be
used for both Parquet and ORC column encryption. There is no code change needed for this Jira.

*Background:*

ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To configure the
encryption, e.g. which column is sensitive, what master key to be used, algorithm, etc, table
properties can be used. It is important that both Parquet and ORC can use unified names.

According to the slide [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in the Parquet community,
it is still discussing to provide several ways and using table properties is one of the options,
while there is no detailed design of the table property names yet.

So it is a good time to discuss within two communities to have unified table names as a superset.

*Proposal:*

There are several encryption properties that need to be specified for a table. Here is the
list. This is the superset of Parquet and ORC. Some of them might not apply to both.
 # PII columns including nest columns
 # Column key metadata, master key metadata
 # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. ORC might support
AES_CTR.
 # Encryption footer - Parquet allow footer to be encrypted or plaintext
 # Footer key metadata

Here is the table properties proposal.  
|*Table Property Name*|*Value*|*Notes*|
|encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
|encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted footer. By default,
it is encrypted.|
|encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to the KMS to define
what key metadata is. The metadata should have enough information to figure out the corresponding
key by the KMS.  |
|encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column name for example,
‘address.zipcode’. 
 
It is up to the KMS to define what key metadata is. The metadata should have enough information
to figure out the corresponding key by the KMS.|

 

  was:
The goal of this Jira is to define a superset of unified table property names that can be
used for both Parquet and ORC column encryption. There is no code change needed for this Jira.

*Background:*

ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To configure the
encryption, e.g. which column is sensitive, what master key to be used, algorithm, etc, table
properties can be used. It is important that both Parquet and ORC can use unified names.

According to the slide [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in the Parquet community,
it is still discussing to provide several ways and using table properties is one of the options,
while there is no detailed design of the table property names yet.

So it is a good time to discuss within two communities to have unified table names as a superset.

*Proposal:*

There are several encryption properties that need to be specified for a table. Here is the
list. This is the superset of Parquet and ORC. Some of them might not apply to both.
 # PII columns including nest columns
 # Column key metadata, master key metadata
 # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. ORC might support
AES_CTR.
 # Encryption footer - Parquet allow footer to be encrypted or plaintext
 # Footer key metadata

Here is the table properties proposal.  
|*Table Property Name*|*Value*|*Notes*|
|encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
|encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted footer. By default,
it is encrypted.|
|encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to the KMS to define
what key metadata is. The metadata should have enough information to figure out the corresponding
key by the KMS.  |
|encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column name for example,
‘address.zipcode’.|

It is up to the KMS to define what key metadata is. The metadata should have enough information
to figure out the corresponding key by the KMS. |


> Table property name definition between ORC and Parquet encrytion
> ----------------------------------------------------------------
>
>                 Key: HIVE-21848
>                 URL: https://issues.apache.org/jira/browse/HIVE-21848
>             Project: Hive
>          Issue Type: Task
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>             Fix For: 3.0.0
>
>
> The goal of this Jira is to define a superset of unified table property names that can
be used for both Parquet and ORC column encryption. There is no code change needed for this
Jira.
> *Background:*
> ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To configure
the encryption, e.g. which column is sensitive, what master key to be used, algorithm, etc,
table properties can be used. It is important that both Parquet and ORC can use unified names.
> According to the slide [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in the Parquet community,
it is still discussing to provide several ways and using table properties is one of the options,
while there is no detailed design of the table property names yet.
> So it is a good time to discuss within two communities to have unified table names as
a superset.
> *Proposal:*
> There are several encryption properties that need to be specified for a table. Here is
the list. This is the superset of Parquet and ORC. Some of them might not apply to both.
>  # PII columns including nest columns
>  # Column key metadata, master key metadata
>  # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. ORC might
support AES_CTR.
>  # Encryption footer - Parquet allow footer to be encrypted or plaintext
>  # Footer key metadata
> Here is the table properties proposal.  
> |*Table Property Name*|*Value*|*Notes*|
> |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.|
> |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted footer.
By default, it is encrypted.|
> |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to the KMS
to define what key metadata is. The metadata should have enough information to figure out
the corresponding key by the KMS.  |
> |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column name for
example, ‘address.zipcode’. 
>  
> It is up to the KMS to define what key metadata is. The metadata should have enough information
to figure out the corresponding key by the KMS.|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message