kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <danburk...@apache.org>
Subject Re: Data encryption in Kudu
Date Tue, 25 Apr 2017 17:38:46 GMT
Hi Franco,

I think you are right that a client-based approach wouldn't work, because
we wouldn't want to encrypt at the level of individual cell values.  That
would get in the way of encoding, compression, predicate evaluation, etc.
As you note, adding encryption at the block layer is probably the way to
go.  Key management is definitely the tricky issue.  We do have one
advantage over HDFS - because Kudu does logical replication, the encryption
key can be scoped to a particular tablet server or tablet replica, it
wouldn't need to be shared among all replicas.  I haven't done enough
research to know if this makes it fundamentally easier to do key
management.  I would assume at a minimum we would want to integrate with
key providers such an HSM.  It would be good to have a thorough review of
existing solutions in the space, such as TDE
<https://en.wikipedia.org/wiki/Transparent_Data_Encryption> and the Hadoop
KMS.  Is this something you are interested in working on?

- Dan

On Tue, Apr 25, 2017 at 8:30 AM, David Alves <davidralves@gmail.com> wrote:

> Hi Franco
>   Dan, Alexey, Todd are our security experts.
>   Folks, thoughts on this?
> Best
> David
> On Mon, Apr 24, 2017 at 7:08 PM, <fventuri@comcast.net> wrote:
>> Over the weekend I started looking at what it would take to add data
>> encryption to Kudu (besides using filesystem encryption via dm-crypt or
>> something like that).
>> Here are a few notes - please feel free to comment on them and add
>> suggestions:
>> - reading through this mailing list, it looks like this feature has been
>> asked a couple of times but last year, but from what I can tell, noone is
>> currently working on it.
>> - a client-based approach to encryption like the one used by HDFS
>> wouldn't work (at least out of the box) because for instance encrypting the
>> primary key at the client would prevent being able to have range filters
>> for scans; it might work for the columns that are not part of the primary
>> key
>> - there's already code in Kudu for several compression codecs (LZ4, gzip,
>> etc); I thought it would be possible to add similar code for encryption
>> codecs (to be applied after the compression, of course)
>> - the WAL log files and delta files should be similarly encrypted too
>> - not sure what would be the best way to manage the key - I see that in
>> HDFS they use a double key mechanism, where the encryption key for the data
>> file is itself encrypted with the allowed user key and this whole process
>> is managed by an external Key Management Service
>> Thanks in advance for your ideas and suggestions,
>> Franco

View raw message