kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fvent...@comcast.net
Subject Re: Data encryption in Kudu
Date Thu, 27 Apr 2017 01:48:07 GMT
David, Dan, Todd, 
thanks for your prompt replies. 

At this stage I am just exploring what it would take to implement some sort of data encryption
in Kudu. 

After reading your comments here are some further thoughts: 

- according to the first sentence in this paragraph in the Kudu docs ( https://kudu.apache.org/docs/schema_design.html#compression

Kudu allows per-column compression using the LZ4 , Snappy , or zlib compression codecs. 

it should be possible to perform per-column encryption by adding 'encryption codecs' right
after the compression codecs. I browsed through the code quickly and I think this done when
reading/writing a 'cfile' (please correct me if I am wrong). If this is correct, this change
could be 'minimally invasive' (at least for the 'cfile' part) and would not require a major
overhaul of the Kudu architecture. 

- as per the key management aspect, I am not a security expert at all, so I am not sure what
would be the best approach here - my thought here is that in most places Kudu is deployed
together with HDFS, so it would be 'desirable' if the key management were consistent between
the two services; on the other hand, I also realize that the basic premises are fundamentally
different: HDFS encrypts everything at the client level and therefore the HDFS engine itself
is almost completely unaware that the data it stores is actually encrypted (except for a special
file hidden attribute, if I understand correctly), while in Kudu the storage engine must have
both the 'public' key (when encrypting) and the 'private' key (when decrypting) otherwise
it can't take advantage of knowing the 'structure' of the data (for instance the Bloom filters
wouldn't probably work with the key being encrypted). This means for instance that an attacker
who is able to gain access to the Kudu tablet servers would probably be able to decrypt the
data. Also one way to achieve something similar to what HDFS does (i.e. client-based encryption
and data encrypted in-flight) could be perhaps using a one-time client certificate generated
by the KMS server, but this would also require changes to the client code. 


----- Original Message -----

From: "Todd Lipcon" <todd@cloudera.com> 
To: user@kudu.apache.org 
Sent: Tuesday, April 25, 2017 3:49:50 PM 
Subject: Re: Data encryption in Kudu 

Agreed with what Dan said. 

I think there are a number of interesting design alternatives to be considered, so before
coding it would be great to work through a design document to explore the alternatives. For
example, we could try to apply encryption at the 'fs/' layer, which would cover all non-WAL
data, but then we would lose the ability to specify encryption on a per-column basis. There
are other requirements that need to be ironed out about whether we'd need to support separate
encryption keys per column/table/server/etc, whether metadata also needs to be encrypted,


On Tue, Apr 25, 2017 at 10:38 AM, Dan Burkert < danburkert@apache.org > wrote: 

Hi Franco, 

I think you are right that a client-based approach wouldn't work, because we wouldn't want
to encrypt at the level of individual cell values. That would get in the way of encoding,
compression, predicate evaluation, etc. As you note, adding encryption at the block layer
is probably the way to go. Key management is definitely the tricky issue. We do have one advantage
over HDFS - because Kudu does logical replication, the encryption key can be scoped to a particular
tablet server or tablet replica, it wouldn't need to be shared among all replicas. I haven't
done enough research to know if this makes it fundamentally easier to do key management. I
would assume at a minimum we would want to integrate with key providers such an HSM. It would
be good to have a thorough review of existing solutions in the space, such as TDE and the
Hadoop KMS. Is this something you are interested in working on? 

- Dan 

On Tue, Apr 25, 2017 at 8:30 AM, David Alves < davidralves@gmail.com > wrote: 


Hi Franco 

Dan, Alexey, Todd are our security experts. 
Folks, thoughts on this? 


On Mon, Apr 24, 2017 at 7:08 PM, < fventuri@comcast.net > wrote: 


Over the weekend I started looking at what it would take to add data encryption to Kudu (besides
using filesystem encryption via dm-crypt or something like that). 

Here are a few notes - please feel free to comment on them and add suggestions: 

- reading through this mailing list, it looks like this feature has been asked a couple of
times but last year, but from what I can tell, noone is currently working on it. 
- a client-based approach to encryption like the one used by HDFS wouldn't work (at least
out of the box) because for instance encrypting the primary key at the client would prevent
being able to have range filters for scans; it might work for the columns that are not part
of the primary key 
- there's already code in Kudu for several compression codecs (LZ4, gzip, etc); I thought
it would be possible to add similar code for encryption codecs (to be applied after the compression,
of course) 
- the WAL log files and delta files should be similarly encrypted too 
- not sure what would be the best way to manage the key - I see that in HDFS they use a double
key mechanism, where the encryption key for the data file is itself encrypted with the allowed
user key and this whole process is managed by an external Key Management Service 

Thanks in advance for your ideas and suggestions, 



Todd Lipcon 
Software Engineer, Cloudera 

View raw message