accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: [DISCUSS] What to do about encryption at rest?
Date Thu, 05 Nov 2015 17:17:00 GMT
My main concern using HDFS encryption vs. built-in Accumulo implementation
is possibly performance with respect to seeks. If we encrypt our indexed
blocks independently (as we do now), I suspect our seeks would be more
performant than relying on HDFS encryption, whose encrypted blocks may not
fall on our index boundaries. If this is a small difference, it might still
be worth it for convenience and simpler maintenance, but I suspect the
difference will be somewhat substantial.

On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <josh.elser@gmail.com> wrote:

> +1 I think this is the right step. My hunch is that some of the common
> data access patterns that we have in Accumulo (over HBase) is that the
> per-colfam encryption isn't quick as common a design pattern as it is
> for HBase (please tell me I'm wrong if anyone disagrees -- this is
> mostly a gut reaction). I think our users would likely benefit more from
> a per-namespace/table encryption control like you suggest.
>
> Implementing RFile encryption at HDFS level (e.g. tie a specific
> zone/key for a table) is probably straightforward. Changing the
> TServer's WAL use would likely be trickier to get right (a tserver would
> have multiple WALs, one for each unique zone/key from Tablet it happens
> to host). Maybe worrying about that is getting ahead of things -- just
> thought about it and figured I'd mention it :)
>
> William Slacum wrote:
> > Yup, #2. I also don't know if it's worth the effort for that specific
> > feature. It might be easier to add something like per-namespace and/or
> > per-table encryption, then define common access patterns for applications
> > that want to use multiple keys for encryption.
> >
> >
> >
> > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afuchs@apache.org>  wrote:
> >
> >> Bill,
> >>
> >> Do you envision one of the following as the driver behind finer-grained
> >> encryption?:
> >>
> >> 1. We would only encrypt certain columns in order to get better
> >> performance;
> >>
> >> 2. We would use different keys on different columns in order to revoke
> >> access to a column via the key store;
> >>
> >> 3. We would only give a tablet server access to a subset of columns at
> any
> >> given time in order to protect something, and figure out what to do for
> >> compactions, etc.;
> >>
> >> 4. Something entirely different...
> >>
> >> Seems like thing #2 might have merit, but I'm not sure it's worth the
> >> effort.
> >>
> >> Adam
> >> On Nov 4, 2015 7:38 PM, "William Slacum"<wslacum@gmail.com>  wrote:
> >>
> >>> @Adam, column family level encryption can be useful for multi-tenant
> >>> environments, and I think it maps pretty well to the document
> >>> partitioning/sharding/wikisearch style tables. Things are trickier in
> >>> Accumulo than in HBase since there isn't a 1:1 mapping between column
> >>> families and files. The built in RFile encryption scheme seems better
> >>> suited to this.
> >>>
> >>> @Christopher&  Keith, it's something we can evaluate. Is there a good
> >> test
> >>> harness for just writing an RFile, opening a reader to it, and just
> >> poking
> >>> around? I was looking at the constructors and they didn't seem
> >>> straightforward enough for me to comprehend them within a few seconds.
> >>>
> >>>
> >>>
> >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<keith@deenlo.com
> >>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>  wrote:
> >>>
> >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<keith@deenlo.com
> >>>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>  wrote:
> >>>>
> >>>>>
> >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<wslacum@gmail.com
> >>>> <javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>>  wrote:
> >>>>>> Is "the code being 'at rest'" you making a funny about active
> >>>> development?
> >>>>>> Making sure I haven't lost my ability to get jokes :)
> >>>>>>
> >>>>>> I see two reasons why the code would be inactive: the feature
is
> >> good
> >>>>>> enough as is or it's not interesting enough to attract attention.
> >>>>>> Considering it's not public API, there are no discussions to
bring
> >>> into
> >>>>>> the
> >>>>>> public API, and there's no effort to document how to use it,
my
> >>>> intuition
> >>>>>> tells me that there isn't enough interest in it from a project
> >>>>>> perspective.
> >>>>>>
> >>>>>>  From a user perspective, I've been getting asked about it when
I
> >> work
> >>>> with
> >>>>>> Accumulo users. My recommendation, exclusively, is to use HDFS
> >>>> encryption
> >>>>>> because I can go to Hadoop's website and find documentation
on it.
> >>> When
> >>>> I
> >>>>>> go to find documentation on Accumulo's offerings, any usability
> >>>>>> information
> >>>>>> comes from vendor SlideShares. Most mentions of the feature
on
> >>> official
> >>>>>> Apache Accumulo channels echo Christopher's sentiments on the
> >> feature
> >>>>>> being
> >>>>>> experimental and not being officially recommended for use.
> >>>>>>
> >>>>>> I wouldn't want to rip out the feature first and then figure
things
> >>> out
> >>>>>> later. Sean already alluded to it, but a roadmap should contain
> >>>> something
> >>>>>> (tool or documentation) to help users migrate if we go down
that
> >>> route.
> >>>>>> What I'm trying to figure out is, when the question of "How
do I do
> >>>>>> encryption at rest in Accumulo?" comes up, what is our community's
> >>>> answer?
> >>>>>> If we went down the route of using HDFS encryption zones, can
we
> >> offer
> >>>> the
> >>>>>> same features? At the very least, we'd be offering the same
> >>>> database-level
> >>>>> Where does the decryption happen with DFS, is it in the DFS client?
> >> If
> >>>>> so, using HDFS level encryption seems to offer the same
> >>> functionality???
> >>>>> Has anyone written a tool that takes an
> >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it is as
an
> >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile?  Wondering if there are
> >> any
> >>>>> unexpected gotchas w/ this.
> >>>>>
> >>>> I was discussing my questions w/ Christopher today and he mentioned
an
> >>>> experiment that I thought was interesting.   What is the random seek
> >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs
> >>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
> >>>>
> >>>>
> >>>>>
> >>>>>
> >>>>>> encryption scheme. I don't know the details of "more advanced
key
> >>>> stores",
> >>>>>> but it seems like we could potentially take any custom
> >> implementation
> >>>> and
> >>>>>> map it to a KeyProvider [1]. I could also envision table level
> >>>> encryption
> >>>>>> being implementable via zones, but probably not down to the
column
> >>>> family
> >>>>>> level.
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>>
> >>
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
> >>>>>>
> >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<afuchs@apache.org
> >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>  wrote:
> >>>>>>> Responses inline.
> >>>>>>>
> >>>>>>> Adam
> >>>>>>>
> >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<ctubbsii@apache.org
> >>>> <javascript:_e(%7B%7D,'cvml','ctubbsii@apache.org');>>  wrote:
> >>>>>>>> 1. I'm not sure I'd call an incomplete solution 'great'.
What it
> >>>> does
> >>>>>> is
> >>>>>>>> provide partial encryption-at-rest protection (unless
you're
> >>> running
> >>>>>>>> without walogs, and have good integration with some
external
> >>> secure
> >>>>>> key
> >>>>>>>> management faculty, and then it's probably fine).
> >>>>>>> The only thing that doesn't get encrypted is a temporary
WAL
> >>> recovery
> >>>>>> file.
> >>>>>>> That is a project we should take on, but it does not imply
that
> >> the
> >>>>>>> existing features are not valuable. With HDFS encryption
options
> >>> this
> >>>>>> would
> >>>>>>> now be a much easier project to take on. Also, the users
I know
> >> that
> >>>> use
> >>>>>>> encryption at rest do so with a more secure key store than
the
> >>>> default.
> >>>>>>>> 2. I'm concerned that anybody using Accumulo's E-A-R
don't
> >>>> necessarily
> >>>>>>>> realize its current shortcomings, or its lack of upstream
> >>>> maintenance
> >>>>>>>> support (which it has not been receiving). It may be
the case
> >> that
> >>>>>> these
> >>>>>>>> users have support from an intermediary, and do understand
the
> >>>>>>>> shortcomings... I don't know, but it's a concern.
> >>>>>>> Anybody that creates a secure system has to analyze the
security
> >> of
> >>>> the
> >>>>>>> system as a whole. Accumulo's encryption at rest is one
part of
> >> the
> >>>>>>> solution. Taking away the tool without providing an alternative
> >> does
> >>>>>>> nothing to improve the security of systems built on Accumulo.
> >>>>>>>
> >>>>>>>> 3. Correction: it has been an explicitly experimental
feature
> >> and
> >>> an
> >>>>>>>> incomplete one, which hasn't really been touched in
two years,
> >> and
> >>>> has
> >>>>>>> been
> >>>>>>>> explicitly excluded by the community for being public
API
> >> because
> >>> of
> >>>>>> its
> >>>>>>>> incompleteness. Age doesn't determine public API status.
The
> >>>> community
> >>>>>>> does.
> >>>>>>>
> >>>>>>> People are using it, so we have to consider the implications
of
> >>>> whatever
> >>>>>>> changes we make and weigh against the benefits. I believe
the last
> >>> bug
> >>>>>> fix
> >>>>>>> was done this year, so I would argue it is being maintained.
> >> Changes
> >>>> to
> >>>>>> our
> >>>>>>> encryption at rest implementation will have consequences
for those
> >>>>>> users.
> >>>>>>> There had better be a clear benefit if we break their systems.
> >>>>>>>
> >>>>>>>> 4. Has Accumulo's been evaluated for security and performance?
> >> By
> >>>>>> whom?
> >>>>>>> Is
> >>>>>>>> it published?
> >>>>>>> Yes, there have been several talks at meetups and conferences
that
> >>>>>> discuss
> >>>>>>> the security and performance of the current solution.
> >>>>>>>
> >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<afuchs@apache.org
> >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>  wrote:
> >>>>>>>>> There's another way to look at the state of Accumulo's
> >>> encryption
> >>>> at
> >>>>>>> rest:
> >>>>>>>>> 1. Encryption at rest works great for what it does,
and the
> >> code
> >>>>>> being
> >>>>>>> "at
> >>>>>>>>> rest" isn't necessarily a problem
> >>>>>>>>> 2. Several organizations are using Accumulo's encryption
at
> >> rest
> >>>>>>>>> effectively in operations
> >>>>>>>>> 3. Encryption at rest has been a supported configuration
> >> option
> >>>> for
> >>>>>>> over
> >>>>>>>>> two years with established plugin interfaces, and
therefore it
> >>>>>> should
> >>>>>>> be
> >>>>>>>>> considered part of the public API
> >>>>>>>>> 4. Upstream alternatives (to my knowledge) have
not been
> >>> analyzed
> >>>>>> for
> >>>>>>>>> performance or security
> >>>>>>>>>
> >>>>>>>>> The given option #2 would at least require an analysis
of
> >>>>>> alternatives,
> >>>>>>> and
> >>>>>>>>> we would have to decide what to do about backwards
> >> compatibility
> >>>> for
> >>>>>>> users
> >>>>>>>>> using custom key stores and encryption strategies
that may or
> >>> may
> >>>>>> not
> >>>>>>> be
> >>>>>>>>> supported by upstream alternatives.
> >>>>>>>>>
> >>>>>>>>> As far as option #1 goes, I can get behind encouraging
people
> >> to
> >>>>>> take
> >>>>>>> up
> >>>>>>>>> projects to improve Accumulo's encryption. I think
we're
> >> already
> >>>>>> going
> >>>>>>> down
> >>>>>>>>> this path, but without having identified resources
to do the
> >>>>>>> improvements.
> >>>>>>>>> Any volunteers?
> >>>>>>>>>
> >>>>>>>>> Adam
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM, William Slacum<
> >>>> wslacum@gmail.com<javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>>
> >>>>>>> wrote:
> >>>>>>>>>> So I've been looking into options for providing
encryption
> >> at
> >>>>>> rest,
> >>>>>>> and
> >>>>>>>>> it
> >>>>>>>>>> seems like what Accumulo has is abandonware
from a project
> >>>>>>> perspective.
> >>>>>>>>>> There is no official documentation on how to
perform
> >>> encryption
> >>>> at
> >>>>>>> rest,
> >>>>>>>>>> and the best information from its status comes
from year (or
> >>>>>> greater)
> >>>>>>> old
> >>>>>>>>>> ticket comments about how the feature is still
experimental.
> >>>>>> Recently
> >>>>>>>>> there
> >>>>>>>>>> was a talk that described using HDFS encryption
zones as an
> >>>>>>> alternative.
> >>>>>>>>>>  From my perspective, this is what I see as
the current
> >>>> situation:
> >>>>>>>>>> 1- Encryption at rest in Accumulo isn't actively
being
> >> worked
> >>> on
> >>>>>>>>>> 2- Encryption at rest in Accumulo isn't part
of the public
> >> API
> >>>> or
> >>>>>>>>> marketed
> >>>>>>>>>> capabilities
> >>>>>>>>>> 3- Documentation for what does exist is scattered
throughout
> >>>> Jira
> >>>>>>>>> comments
> >>>>>>>>>> or presentations
> >>>>>>>>>> 4- A viable alternative exists that appears
to have feature
> >>>>>> parity in
> >>>>>>>>> HDFS
> >>>>>>>>>> encryption
> >>>>>>>>>> 5- HBase has finer grained encryption capabilities
that
> >> extend
> >>>>>> beyond
> >>>>>>>>> what
> >>>>>>>>>> HDFS provides
> >>>>>>>>>>
> >>>>>>>>>> Moving forward, what's the consensus for supporting
this
> >>>> feature?
> >>>>>>>>>> Personally, I see two options:
> >>>>>>>>>>
> >>>>>>>>>> 1- Start going down a path to bring the feature
into the
> >>>> forefront
> >>>>>>> and
> >>>>>>>>>> start providing feature parity with HBase
> >>>>>>>>>>
> >>>>>>>>>> or
> >>>>>>>>>>
> >>>>>>>>>> 2- Remove the feature and place emphasis on
upstream
> >>> encryption
> >>>>>>> offerings
> >>>>>>>>>> Any input is welcomed&  appreciated!
> >>>>>>>>>>
> >>>>>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message