accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: [DISCUSS] What to do about encryption at rest?
Date Thu, 05 Nov 2015 17:11:03 GMT
+1 I think this is the right step. My hunch is that some of the common 
data access patterns that we have in Accumulo (over HBase) is that the 
per-colfam encryption isn't quick as common a design pattern as it is 
for HBase (please tell me I'm wrong if anyone disagrees -- this is 
mostly a gut reaction). I think our users would likely benefit more from 
a per-namespace/table encryption control like you suggest.

Implementing RFile encryption at HDFS level (e.g. tie a specific 
zone/key for a table) is probably straightforward. Changing the 
TServer's WAL use would likely be trickier to get right (a tserver would 
have multiple WALs, one for each unique zone/key from Tablet it happens 
to host). Maybe worrying about that is getting ahead of things -- just 
thought about it and figured I'd mention it :)

William Slacum wrote:
> Yup, #2. I also don't know if it's worth the effort for that specific
> feature. It might be easier to add something like per-namespace and/or
> per-table encryption, then define common access patterns for applications
> that want to use multiple keys for encryption.
>
>
>
> On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afuchs@apache.org>  wrote:
>
>> Bill,
>>
>> Do you envision one of the following as the driver behind finer-grained
>> encryption?:
>>
>> 1. We would only encrypt certain columns in order to get better
>> performance;
>>
>> 2. We would use different keys on different columns in order to revoke
>> access to a column via the key store;
>>
>> 3. We would only give a tablet server access to a subset of columns at any
>> given time in order to protect something, and figure out what to do for
>> compactions, etc.;
>>
>> 4. Something entirely different...
>>
>> Seems like thing #2 might have merit, but I'm not sure it's worth the
>> effort.
>>
>> Adam
>> On Nov 4, 2015 7:38 PM, "William Slacum"<wslacum@gmail.com>  wrote:
>>
>>> @Adam, column family level encryption can be useful for multi-tenant
>>> environments, and I think it maps pretty well to the document
>>> partitioning/sharding/wikisearch style tables. Things are trickier in
>>> Accumulo than in HBase since there isn't a 1:1 mapping between column
>>> families and files. The built in RFile encryption scheme seems better
>>> suited to this.
>>>
>>> @Christopher&  Keith, it's something we can evaluate. Is there a good
>> test
>>> harness for just writing an RFile, opening a reader to it, and just
>> poking
>>> around? I was looking at the constructors and they didn't seem
>>> straightforward enough for me to comprehend them within a few seconds.
>>>
>>>
>>>
>>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<keith@deenlo.com
>>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>  wrote:
>>>
>>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<keith@deenlo.com
>>>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>  wrote:
>>>>
>>>>>
>>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<wslacum@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>>  wrote:
>>>>>> Is "the code being 'at rest'" you making a funny about active
>>>> development?
>>>>>> Making sure I haven't lost my ability to get jokes :)
>>>>>>
>>>>>> I see two reasons why the code would be inactive: the feature is
>> good
>>>>>> enough as is or it's not interesting enough to attract attention.
>>>>>> Considering it's not public API, there are no discussions to bring
>>> into
>>>>>> the
>>>>>> public API, and there's no effort to document how to use it, my
>>>> intuition
>>>>>> tells me that there isn't enough interest in it from a project
>>>>>> perspective.
>>>>>>
>>>>>>  From a user perspective, I've been getting asked about it when I
>> work
>>>> with
>>>>>> Accumulo users. My recommendation, exclusively, is to use HDFS
>>>> encryption
>>>>>> because I can go to Hadoop's website and find documentation on it.
>>> When
>>>> I
>>>>>> go to find documentation on Accumulo's offerings, any usability
>>>>>> information
>>>>>> comes from vendor SlideShares. Most mentions of the feature on
>>> official
>>>>>> Apache Accumulo channels echo Christopher's sentiments on the
>> feature
>>>>>> being
>>>>>> experimental and not being officially recommended for use.
>>>>>>
>>>>>> I wouldn't want to rip out the feature first and then figure things
>>> out
>>>>>> later. Sean already alluded to it, but a roadmap should contain
>>>> something
>>>>>> (tool or documentation) to help users migrate if we go down that
>>> route.
>>>>>> What I'm trying to figure out is, when the question of "How do I
do
>>>>>> encryption at rest in Accumulo?" comes up, what is our community's
>>>> answer?
>>>>>> If we went down the route of using HDFS encryption zones, can we
>> offer
>>>> the
>>>>>> same features? At the very least, we'd be offering the same
>>>> database-level
>>>>> Where does the decryption happen with DFS, is it in the DFS client?
>> If
>>>>> so, using HDFS level encryption seems to offer the same
>>> functionality???
>>>>> Has anyone written a tool that takes an
>>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it is as an
>>>>> Accumulo-unencrypted-HDFS-encrypted-RFile?  Wondering if there are
>> any
>>>>> unexpected gotchas w/ this.
>>>>>
>>>> I was discussing my questions w/ Christopher today and he mentioned an
>>>> experiment that I thought was interesting.   What is the random seek
>>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs
>>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
>>>>
>>>>
>>>>>
>>>>>
>>>>>> encryption scheme. I don't know the details of "more advanced key
>>>> stores",
>>>>>> but it seems like we could potentially take any custom
>> implementation
>>>> and
>>>>>> map it to a KeyProvider [1]. I could also envision table level
>>>> encryption
>>>>>> being implementable via zones, but probably not down to the column
>>>> family
>>>>>> level.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
>>>>>>
>>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<afuchs@apache.org
>>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>  wrote:
>>>>>>> Responses inline.
>>>>>>>
>>>>>>> Adam
>>>>>>>
>>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<ctubbsii@apache.org
>>>> <javascript:_e(%7B%7D,'cvml','ctubbsii@apache.org');>>  wrote:
>>>>>>>> 1. I'm not sure I'd call an incomplete solution 'great'.
What it
>>>> does
>>>>>> is
>>>>>>>> provide partial encryption-at-rest protection (unless you're
>>> running
>>>>>>>> without walogs, and have good integration with some external
>>> secure
>>>>>> key
>>>>>>>> management faculty, and then it's probably fine).
>>>>>>> The only thing that doesn't get encrypted is a temporary WAL
>>> recovery
>>>>>> file.
>>>>>>> That is a project we should take on, but it does not imply that
>> the
>>>>>>> existing features are not valuable. With HDFS encryption options
>>> this
>>>>>> would
>>>>>>> now be a much easier project to take on. Also, the users I know
>> that
>>>> use
>>>>>>> encryption at rest do so with a more secure key store than the
>>>> default.
>>>>>>>> 2. I'm concerned that anybody using Accumulo's E-A-R don't
>>>> necessarily
>>>>>>>> realize its current shortcomings, or its lack of upstream
>>>> maintenance
>>>>>>>> support (which it has not been receiving). It may be the
case
>> that
>>>>>> these
>>>>>>>> users have support from an intermediary, and do understand
the
>>>>>>>> shortcomings... I don't know, but it's a concern.
>>>>>>> Anybody that creates a secure system has to analyze the security
>> of
>>>> the
>>>>>>> system as a whole. Accumulo's encryption at rest is one part
of
>> the
>>>>>>> solution. Taking away the tool without providing an alternative
>> does
>>>>>>> nothing to improve the security of systems built on Accumulo.
>>>>>>>
>>>>>>>> 3. Correction: it has been an explicitly experimental feature
>> and
>>> an
>>>>>>>> incomplete one, which hasn't really been touched in two years,
>> and
>>>> has
>>>>>>> been
>>>>>>>> explicitly excluded by the community for being public API
>> because
>>> of
>>>>>> its
>>>>>>>> incompleteness. Age doesn't determine public API status.
The
>>>> community
>>>>>>> does.
>>>>>>>
>>>>>>> People are using it, so we have to consider the implications
of
>>>> whatever
>>>>>>> changes we make and weigh against the benefits. I believe the
last
>>> bug
>>>>>> fix
>>>>>>> was done this year, so I would argue it is being maintained.
>> Changes
>>>> to
>>>>>> our
>>>>>>> encryption at rest implementation will have consequences for
those
>>>>>> users.
>>>>>>> There had better be a clear benefit if we break their systems.
>>>>>>>
>>>>>>>> 4. Has Accumulo's been evaluated for security and performance?
>> By
>>>>>> whom?
>>>>>>> Is
>>>>>>>> it published?
>>>>>>> Yes, there have been several talks at meetups and conferences
that
>>>>>> discuss
>>>>>>> the security and performance of the current solution.
>>>>>>>
>>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<afuchs@apache.org
>>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>  wrote:
>>>>>>>>> There's another way to look at the state of Accumulo's
>>> encryption
>>>> at
>>>>>>> rest:
>>>>>>>>> 1. Encryption at rest works great for what it does, and
the
>> code
>>>>>> being
>>>>>>> "at
>>>>>>>>> rest" isn't necessarily a problem
>>>>>>>>> 2. Several organizations are using Accumulo's encryption
at
>> rest
>>>>>>>>> effectively in operations
>>>>>>>>> 3. Encryption at rest has been a supported configuration
>> option
>>>> for
>>>>>>> over
>>>>>>>>> two years with established plugin interfaces, and therefore
it
>>>>>> should
>>>>>>> be
>>>>>>>>> considered part of the public API
>>>>>>>>> 4. Upstream alternatives (to my knowledge) have not been
>>> analyzed
>>>>>> for
>>>>>>>>> performance or security
>>>>>>>>>
>>>>>>>>> The given option #2 would at least require an analysis
of
>>>>>> alternatives,
>>>>>>> and
>>>>>>>>> we would have to decide what to do about backwards
>> compatibility
>>>> for
>>>>>>> users
>>>>>>>>> using custom key stores and encryption strategies that
may or
>>> may
>>>>>> not
>>>>>>> be
>>>>>>>>> supported by upstream alternatives.
>>>>>>>>>
>>>>>>>>> As far as option #1 goes, I can get behind encouraging
people
>> to
>>>>>> take
>>>>>>> up
>>>>>>>>> projects to improve Accumulo's encryption. I think we're
>> already
>>>>>> going
>>>>>>> down
>>>>>>>>> this path, but without having identified resources to
do the
>>>>>>> improvements.
>>>>>>>>> Any volunteers?
>>>>>>>>>
>>>>>>>>> Adam
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM, William Slacum<
>>>> wslacum@gmail.com<javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>>
>>>>>>> wrote:
>>>>>>>>>> So I've been looking into options for providing encryption
>> at
>>>>>> rest,
>>>>>>> and
>>>>>>>>> it
>>>>>>>>>> seems like what Accumulo has is abandonware from
a project
>>>>>>> perspective.
>>>>>>>>>> There is no official documentation on how to perform
>>> encryption
>>>> at
>>>>>>> rest,
>>>>>>>>>> and the best information from its status comes from
year (or
>>>>>> greater)
>>>>>>> old
>>>>>>>>>> ticket comments about how the feature is still experimental.
>>>>>> Recently
>>>>>>>>> there
>>>>>>>>>> was a talk that described using HDFS encryption zones
as an
>>>>>>> alternative.
>>>>>>>>>>  From my perspective, this is what I see as the current
>>>> situation:
>>>>>>>>>> 1- Encryption at rest in Accumulo isn't actively
being
>> worked
>>> on
>>>>>>>>>> 2- Encryption at rest in Accumulo isn't part of the
public
>> API
>>>> or
>>>>>>>>> marketed
>>>>>>>>>> capabilities
>>>>>>>>>> 3- Documentation for what does exist is scattered
throughout
>>>> Jira
>>>>>>>>> comments
>>>>>>>>>> or presentations
>>>>>>>>>> 4- A viable alternative exists that appears to have
feature
>>>>>> parity in
>>>>>>>>> HDFS
>>>>>>>>>> encryption
>>>>>>>>>> 5- HBase has finer grained encryption capabilities
that
>> extend
>>>>>> beyond
>>>>>>>>> what
>>>>>>>>>> HDFS provides
>>>>>>>>>>
>>>>>>>>>> Moving forward, what's the consensus for supporting
this
>>>> feature?
>>>>>>>>>> Personally, I see two options:
>>>>>>>>>>
>>>>>>>>>> 1- Start going down a path to bring the feature into
the
>>>> forefront
>>>>>>> and
>>>>>>>>>> start providing feature parity with HBase
>>>>>>>>>>
>>>>>>>>>> or
>>>>>>>>>>
>>>>>>>>>> 2- Remove the feature and place emphasis on upstream
>>> encryption
>>>>>>> offerings
>>>>>>>>>> Any input is welcomed&  appreciated!
>>>>>>>>>>
>>>>>
>

Mime
View raw message