Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9CECF200C61 for ; Tue, 25 Apr 2017 19:39:31 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9B919160BB3; Tue, 25 Apr 2017 17:39:31 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BC1E6160B8E for ; Tue, 25 Apr 2017 19:39:30 +0200 (CEST) Received: (qmail 5364 invoked by uid 500); 25 Apr 2017 17:39:29 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 5354 invoked by uid 99); 25 Apr 2017 17:39:29 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Apr 2017 17:39:29 +0000 Received: from mail-wr0-f178.google.com (mail-wr0-f178.google.com [209.85.128.178]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id C11DC1A01D7 for ; Tue, 25 Apr 2017 17:39:28 +0000 (UTC) Received: by mail-wr0-f178.google.com with SMTP id w50so89646541wrc.0 for ; Tue, 25 Apr 2017 10:39:28 -0700 (PDT) X-Gm-Message-State: AN3rC/5sQ1hVoC4v+yINI3c3RZVGtnUsa71rXPgNWCR8ITEeVtxO4T5C E0Sm5U1jqVHLqHNrosVDOun9sivTnkp7 X-Received: by 10.223.145.65 with SMTP id j59mr10835337wrj.200.1493141967322; Tue, 25 Apr 2017 10:39:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.181.132 with HTTP; Tue, 25 Apr 2017 10:38:46 -0700 (PDT) In-Reply-To: References: <793726984.13692432.1493084898985.JavaMail.zimbra@comcast.net> <1668036986.13703141.1493086135916.JavaMail.zimbra@comcast.net> From: Dan Burkert Date: Tue, 25 Apr 2017 10:38:46 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Data encryption in Kudu To: user@kudu.apache.org Content-Type: multipart/alternative; boundary=94eb2c0696888fee7d054e013463 archived-at: Tue, 25 Apr 2017 17:39:31 -0000 --94eb2c0696888fee7d054e013463 Content-Type: text/plain; charset=UTF-8 Hi Franco, I think you are right that a client-based approach wouldn't work, because we wouldn't want to encrypt at the level of individual cell values. That would get in the way of encoding, compression, predicate evaluation, etc. As you note, adding encryption at the block layer is probably the way to go. Key management is definitely the tricky issue. We do have one advantage over HDFS - because Kudu does logical replication, the encryption key can be scoped to a particular tablet server or tablet replica, it wouldn't need to be shared among all replicas. I haven't done enough research to know if this makes it fundamentally easier to do key management. I would assume at a minimum we would want to integrate with key providers such an HSM. It would be good to have a thorough review of existing solutions in the space, such as TDE and the Hadoop KMS. Is this something you are interested in working on? - Dan On Tue, Apr 25, 2017 at 8:30 AM, David Alves wrote: > Hi Franco > > Dan, Alexey, Todd are our security experts. > Folks, thoughts on this? > > Best > David > > On Mon, Apr 24, 2017 at 7:08 PM, wrote: > >> Over the weekend I started looking at what it would take to add data >> encryption to Kudu (besides using filesystem encryption via dm-crypt or >> something like that). >> >> Here are a few notes - please feel free to comment on them and add >> suggestions: >> >> - reading through this mailing list, it looks like this feature has been >> asked a couple of times but last year, but from what I can tell, noone is >> currently working on it. >> - a client-based approach to encryption like the one used by HDFS >> wouldn't work (at least out of the box) because for instance encrypting the >> primary key at the client would prevent being able to have range filters >> for scans; it might work for the columns that are not part of the primary >> key >> - there's already code in Kudu for several compression codecs (LZ4, gzip, >> etc); I thought it would be possible to add similar code for encryption >> codecs (to be applied after the compression, of course) >> - the WAL log files and delta files should be similarly encrypted too >> - not sure what would be the best way to manage the key - I see that in >> HDFS they use a double key mechanism, where the encryption key for the data >> file is itself encrypted with the allowed user key and this whole process >> is managed by an external Key Management Service >> >> Thanks in advance for your ideas and suggestions, >> Franco >> > > --94eb2c0696888fee7d054e013463 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Franco,

I think you are right that a= client-based approach wouldn't work, because we wouldn't want to e= ncrypt at the level of individual cell values.=C2=A0 That would get in the = way of encoding, compression, predicate evaluation, etc.=C2=A0 As you note,= adding encryption at the block layer is probably the way to go.=C2=A0 Key = management is definitely the tricky issue.=C2=A0 We do have one advantage o= ver HDFS - because Kudu does logical replication, the encryption key can be= scoped to a particular tablet server or tablet replica, it wouldn't ne= ed to be shared among all replicas.=C2=A0 I haven't done enough researc= h to know if this makes it fundamentally easier to do key management.=C2=A0= I would assume at a minimum we would want to integrate with key providers = such an HSM.=C2=A0 It would be good to have a thorough review of existing s= olutions in the space, such as TDE=C2=A0and the Hadoop KMS= .=C2=A0 Is this something you are interested in working on?

<= /div>
- Dan

On Tue, Apr 25, 2017 at 8:30 AM, David Alves <davidralves@gmail.co= m> wrote:
= Hi Franco

=C2=A0 Dan, Alexey, Todd are our security expe= rts.
=C2=A0 Folks, thoughts on this?

Bes= t
David
On Mon, Apr 24, 2017 at 7:08 PM, <fventur= i@comcast.net> wrote:
=
Over the= weekend I started looking at what it would take to add data encryption to = Kudu (besides using filesystem encryption via dm-crypt or something like th= at).

Here are a few notes - please feel free to co= mment on them and add suggestions:

- reading throu= gh this mailing list, it looks like this feature has been asked a couple of= times but last year, but from what I can tell, noone is currently working = on it.
- a client-based approach to encryption like the one used = by HDFS wouldn't work (at least out of the box) because for instance en= crypting the primary key at the client would prevent being able to have ran= ge filters for scans; it might work for the columns that are not part of th= e primary key
- there's already code in Kudu for several comp= ression codecs (LZ4, gzip, etc); I thought it would be possible to add simi= lar code for encryption codecs (to be applied after the compression, of cou= rse)
- the WAL log files and delta files should be similarly encr= ypted too
- not sure what would be the best way to manage the key= - I see that in HDFS they use a double key mechanism, where the encryption= key for the data file is itself encrypted with the allowed user key and th= is whole process is managed by an external Key Management Service

Thanks in advance for your ideas and suggestions,
Franco=C2=A0


--94eb2c0696888fee7d054e013463--