Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EEA10200C6F for ; Tue, 25 Apr 2017 04:09:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E5B3D160BA5; Tue, 25 Apr 2017 02:09:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 382F2160B99 for ; Tue, 25 Apr 2017 04:09:01 +0200 (CEST) Received: (qmail 63072 invoked by uid 500); 25 Apr 2017 02:09:00 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 63062 invoked by uid 99); 25 Apr 2017 02:09:00 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Apr 2017 02:09:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B0E9FC0B3C for ; Tue, 25 Apr 2017 02:08:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.898 X-Spam-Level: X-Spam-Status: No, score=-0.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=comcast.net Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id vkuRMY1HAlfh for ; Tue, 25 Apr 2017 02:08:58 +0000 (UTC) Received: from resqmta-po-01v.sys.comcast.net (resqmta-po-01v.sys.comcast.net [96.114.154.160]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id DB41C5FE5B for ; Tue, 25 Apr 2017 02:08:57 +0000 (UTC) Received: from resomta-po-19v.sys.comcast.net ([96.114.154.243]) by resqmta-po-01v.sys.comcast.net with SMTP id 2ptxdfMsMge2x2puSda9J3; Tue, 25 Apr 2017 02:08:56 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20161114; t=1493086136; bh=8rL1/NFylJcKMPnWuwynOz2Kjl4+OlRsWRGrZppxx4w=; h=Received:Received:Date:From:To:Message-ID:Subject:MIME-Version: Content-Type; b=MA4WN1ShX1DsUcCJgO96vJshXWCekuWbehAcXiB1lQOpkgOsjfiPXM4a2FAKlMTOI Jv3RcOUxMCAwr80Z8x63v9XTeTGj4g9B3ESq/yKDHMqdbypepmvhvIBwMUZxbuVz/L CUjprVMlJ3hj0s6NFNJRqK5bu5dpcHPt3Xl47+yv6usEqpuFc/rfEQ6I6lTIMQHAoN wM+3l6TO9smu7/Z9J/vzldXhrhG3uoVMdsQgCqATAKUe9ElmAkdMg1bVLHvFZ+YBAH NgK+lYgqWWaV1UdN1uA6zKCDQt2d9308vFa7lFown/Ruhn+uQyNq/o8ajQ86+HrpW5 anMME17/d3SIQ== Received: from resmail-po-098v.sys.comcast.net ([162.150.176.108]) by resomta-po-19v.sys.comcast.net with SMTP id 2puSdxdOtiVxk2puSdHtFW; Tue, 25 Apr 2017 02:08:56 +0000 Date: Tue, 25 Apr 2017 02:08:55 +0000 (UTC) From: fventuri@comcast.net To: user@kudu.apache.org Message-ID: <1668036986.13703141.1493086135916.JavaMail.zimbra@comcast.net> In-Reply-To: <793726984.13692432.1493084898985.JavaMail.zimbra@comcast.net> Subject: Data encryption in Kudu MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_13703140_1414164905.1493086135915" X-Originating-IP: [::ffff:76.18.202.215] X-Mailer: Zimbra 8.0.7_GA_6031 (ZimbraWebClient - GC58 (Linux)/8.0.7_GA_6031) Thread-Topic: Data encryption in Kudu Thread-Index: K7r7y7DGsTvCS+4YdolYMJ4vB1q/zQ== X-CMAE-Envelope: MS4wfFiOKcLHTC3yKtMtkukB/8nUKRkJjjt2HCTmBRvTXPBKqoEyLQXf1I/tesVQErrhsunUMZi83wXhjBoexSK2HStIUetx2BP3irAGiJpWrx4Zf3b7jHA6 G7//El+gjjhbajfOH0PPY47tMmBUg8Gk8GVq31JJK8VbPbdeE5oroMBk3FcLQw/Jd9DSUgRBC7ldaxGyDczA/x1ZvLLH16Rb8BE= archived-at: Tue, 25 Apr 2017 02:09:02 -0000 ------=_Part_13703140_1414164905.1493086135915 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Over the weekend I started looking at what it would take to add data encryption to Kudu (besides using filesystem encryption via dm-crypt or something like that). Here are a few notes - please feel free to comment on them and add suggestions: - reading through this mailing list, it looks like this feature has been asked a couple of times but last year, but from what I can tell, noone is currently working on it. - a client-based approach to encryption like the one used by HDFS wouldn't work (at least out of the box) because for instance encrypting the primary key at the client would prevent being able to have range filters for scans; it might work for the columns that are not part of the primary key - there's already code in Kudu for several compression codecs (LZ4, gzip, etc); I thought it would be possible to add similar code for encryption codecs (to be applied after the compression, of course) - the WAL log files and delta files should be similarly encrypted too - not sure what would be the best way to manage the key - I see that in HDFS they use a double key mechanism, where the encryption key for the data file is itself encrypted with the allowed user key and this whole process is managed by an external Key Management Service Thanks in advance for your ideas and suggestions, Franco ------=_Part_13703140_1414164905.1493086135915 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Over the weekend I started looking at what it would take to add da= ta encryption to Kudu (besides using filesystem encryption via dm-crypt or = something like that).

Here are a few notes - pleas= e feel free to comment on them and add suggestions:

- reading through this mailing list, it looks like this feature has been = asked a couple of times but last year, but from what I can tell, noone is c= urrently working on it.
- a client-based approach to encryption l= ike the one used by HDFS wouldn't work (at least out of the box) because fo= r instance encrypting the primary key at the client would prevent being abl= e to have range filters for scans; it might work for the columns that are n= ot part of the primary key
- there's already code in Kudu for sev= eral compression codecs (LZ4, gzip, etc); I thought it would be possible to= add similar code for encryption codecs (to be applied after the compressio= n, of course)
- the WAL log files and delta files should be simil= arly encrypted too
- not sure what would be the best way to manag= e the key - I see that in HDFS they use a double key mechanism, where the e= ncryption key for the data file is itself encrypted with the allowed user k= ey and this whole process is managed by an external Key Management Service<= /div>

Thanks in advance for your ideas and suggestions,<= /div>
Franco 
------=_Part_13703140_1414164905.1493086135915--