hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9578) Client side cell encryption
Date Wed, 18 Sep 2013 23:54:51 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771415#comment-13771415
] 

Andrew Purtell commented on HBASE-9578:
---------------------------------------

First, a HTable wrapper but definition must be explicitly used by an application. It is an
easy solution to implement but is not transparent to the end user. Before HBase 0.95/0.96,
with its new RPC codecs, a wrapper was the only implementation choice that avoids invasive
changes to the client library. After 0.95/0.96, RPC codecs offer an interesting option for
adding client side value encryption (and/or compression) in a more transparent way.

Second, HBase is completely agnostic about value data but not so about keys. Traditional encryption
if applied to key data as well would destroy data locality and scan semantics. There are some
"deterministic encryption" schemes which would maintain a sort ordering but at the price of
increased exposure to successful cryptanalysis. 

Next, and related to the trouble with keys and sorting, if all cryptographic transformations
are performed entirely on the client side, as a consequence the encrypted data cannot be transformed
into plaintext on the server, so several HBase API operations become impossible: append, increment,
checkAndPut, checkAndDelete, and any scan filter that wants to examine cell values. For analytical
workloads (short scans with highly selective filters, aggregating coprocessors) in particular
this requires transferring much more data to the client for processing there than would otherwise
be needed.

We could consider sending private key material over in the RPC to work around this problem,
but it is risky to ship private key material over the network ever, never mind frequently.
So let's consider what can be done on the server as much as practical without sending over
user private key material.

A naive option would be to implement a fully homomorphic encryption scheme. In theory, any
operation on the server over encrypted data would be possible. Unfortunately fully homomorphic
encryption in practice imposes overheads on the order of 10^9. There are however some practical
but more limited schemes which may be useful.

At VLDB 2013, MIT CSAIL presented the paper "Processing Analytical Queries over Encrypted
Data" which describes a research prototype, based on Postgres, capable of mixed operations
over encrypted data client and server side. They employ encryption schemes applicable as well
for restoring the HBase API operations mentioned above. "Deterministic encryption" with AES
would make equality tests possible, restoring checkAndX operations, if we accept the leakage
resulting from duplicates. Deterministic transformations also restore Append. OPE encryption
can restore range scanning semantics, but with greater leakage leading to practical partial
plaintext recovery. Maybe for some that would be an acceptable tradeoff. More interesting,
Paillier homomorphic encryption supports addition, therefore summation, and could restore
Increments and aggregating coprocessor functions like sum(). We might support some subset
of scanning with filters by rewriting the filters passed in for a Scan with encryption-aware
substitutions.

Of course, there is the problem of encrypting the data at the client with the correct scheme
for the wanted semantics. The full design burden could be pushed to the user. Better, the
mentioned paper describes a table designer executed at data import time to choose the optimal
physical layout for the desired schema. Something like that could be developed for HBase as
well leveraging the typed data library.
                
> Client side cell encryption
> ---------------------------
>
>                 Key: HBASE-9578
>                 URL: https://issues.apache.org/jira/browse/HBASE-9578
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>
> HBASE-7544 will protect key and value data on the server from accidental leakage by way
of improperly disposed disks, improper direct filesystem access, or incorrect HDFS permissions.
There are also use cases where sensitive data stored in a table or column family by a given
user or application should be protected from all others, and the combination of transparent
server-side storage encryption and transport security (SASL auth-conf) is still not sufficient.
These instances call for a client side per-cell encryption feature, given the following additional
observations:
> - The scope of transmission, distribution, and storage of private key material should
be as limited as possible. The server is a centralized target (even in the case of an HBase
cluster) where the scope of damage from a compromise is magnified if user key material also
resides there or can be intercepted after compromise. Where keys are stored in hardware devices,
e.g. smartcards, getting the keys to the server may be not possible anyway.
> - A client system is far more likely than a contended shared server resource to have
necessary available CPU cycles for per-operation cryptographic overheads.
> For some cases we might not care so much about the second item, but the first is very
important.
> I have an implementation of per cell client side encryption as an encrypting HTable wrapper
which I could contribute if there is interest.
> This JIRA is also about brainstorming how to do better than that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message