accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Drob <md...@apache.org>
Subject Re: [DISCUSS] What to do about encryption at rest?
Date Thu, 05 Nov 2015 17:22:19 GMT
Can we file some JIRAs to build out a suite to test this and run the
necessary tests?

On Thu, Nov 5, 2015 at 11:17 AM, Christopher <ctubbsii@apache.org> wrote:

> My main concern using HDFS encryption vs. built-in Accumulo implementation
> is possibly performance with respect to seeks. If we encrypt our indexed
> blocks independently (as we do now), I suspect our seeks would be more
> performant than relying on HDFS encryption, whose encrypted blocks may not
> fall on our index boundaries. If this is a small difference, it might still
> be worth it for convenience and simpler maintenance, but I suspect the
> difference will be somewhat substantial.
>
> On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <josh.elser@gmail.com> wrote:
>
> > +1 I think this is the right step. My hunch is that some of the common
> > data access patterns that we have in Accumulo (over HBase) is that the
> > per-colfam encryption isn't quick as common a design pattern as it is
> > for HBase (please tell me I'm wrong if anyone disagrees -- this is
> > mostly a gut reaction). I think our users would likely benefit more from
> > a per-namespace/table encryption control like you suggest.
> >
> > Implementing RFile encryption at HDFS level (e.g. tie a specific
> > zone/key for a table) is probably straightforward. Changing the
> > TServer's WAL use would likely be trickier to get right (a tserver would
> > have multiple WALs, one for each unique zone/key from Tablet it happens
> > to host). Maybe worrying about that is getting ahead of things -- just
> > thought about it and figured I'd mention it :)
> >
> > William Slacum wrote:
> > > Yup, #2. I also don't know if it's worth the effort for that specific
> > > feature. It might be easier to add something like per-namespace and/or
> > > per-table encryption, then define common access patterns for
> applications
> > > that want to use multiple keys for encryption.
> > >
> > >
> > >
> > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afuchs@apache.org>  wrote:
> > >
> > >> Bill,
> > >>
> > >> Do you envision one of the following as the driver behind
> finer-grained
> > >> encryption?:
> > >>
> > >> 1. We would only encrypt certain columns in order to get better
> > >> performance;
> > >>
> > >> 2. We would use different keys on different columns in order to revoke
> > >> access to a column via the key store;
> > >>
> > >> 3. We would only give a tablet server access to a subset of columns at
> > any
> > >> given time in order to protect something, and figure out what to do
> for
> > >> compactions, etc.;
> > >>
> > >> 4. Something entirely different...
> > >>
> > >> Seems like thing #2 might have merit, but I'm not sure it's worth the
> > >> effort.
> > >>
> > >> Adam
> > >> On Nov 4, 2015 7:38 PM, "William Slacum"<wslacum@gmail.com>  wrote:
> > >>
> > >>> @Adam, column family level encryption can be useful for multi-tenant
> > >>> environments, and I think it maps pretty well to the document
> > >>> partitioning/sharding/wikisearch style tables. Things are trickier
in
> > >>> Accumulo than in HBase since there isn't a 1:1 mapping between column
> > >>> families and files. The built in RFile encryption scheme seems better
> > >>> suited to this.
> > >>>
> > >>> @Christopher&  Keith, it's something we can evaluate. Is there
a good
> > >> test
> > >>> harness for just writing an RFile, opening a reader to it, and just
> > >> poking
> > >>> around? I was looking at the constructors and they didn't seem
> > >>> straightforward enough for me to comprehend them within a few
> seconds.
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<keith@deenlo.com
> > >>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>  wrote:
> > >>>
> > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<keith@deenlo.com
> > >>>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>  wrote:
> > >>>>
> > >>>>>
> > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<wslacum@gmail.com
> > >>>> <javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>> 
wrote:
> > >>>>>> Is "the code being 'at rest'" you making a funny about
active
> > >>>> development?
> > >>>>>> Making sure I haven't lost my ability to get jokes :)
> > >>>>>>
> > >>>>>> I see two reasons why the code would be inactive: the feature
is
> > >> good
> > >>>>>> enough as is or it's not interesting enough to attract
attention.
> > >>>>>> Considering it's not public API, there are no discussions
to bring
> > >>> into
> > >>>>>> the
> > >>>>>> public API, and there's no effort to document how to use
it, my
> > >>>> intuition
> > >>>>>> tells me that there isn't enough interest in it from a
project
> > >>>>>> perspective.
> > >>>>>>
> > >>>>>>  From a user perspective, I've been getting asked about
it when I
> > >> work
> > >>>> with
> > >>>>>> Accumulo users. My recommendation, exclusively, is to use
HDFS
> > >>>> encryption
> > >>>>>> because I can go to Hadoop's website and find documentation
on it.
> > >>> When
> > >>>> I
> > >>>>>> go to find documentation on Accumulo's offerings, any usability
> > >>>>>> information
> > >>>>>> comes from vendor SlideShares. Most mentions of the feature
on
> > >>> official
> > >>>>>> Apache Accumulo channels echo Christopher's sentiments
on the
> > >> feature
> > >>>>>> being
> > >>>>>> experimental and not being officially recommended for use.
> > >>>>>>
> > >>>>>> I wouldn't want to rip out the feature first and then figure
> things
> > >>> out
> > >>>>>> later. Sean already alluded to it, but a roadmap should
contain
> > >>>> something
> > >>>>>> (tool or documentation) to help users migrate if we go
down that
> > >>> route.
> > >>>>>> What I'm trying to figure out is, when the question of
"How do I
> do
> > >>>>>> encryption at rest in Accumulo?" comes up, what is our
community's
> > >>>> answer?
> > >>>>>> If we went down the route of using HDFS encryption zones,
can we
> > >> offer
> > >>>> the
> > >>>>>> same features? At the very least, we'd be offering the
same
> > >>>> database-level
> > >>>>> Where does the decryption happen with DFS, is it in the DFS
client?
> > >> If
> > >>>>> so, using HDFS level encryption seems to offer the same
> > >>> functionality???
> > >>>>> Has anyone written a tool that takes an
> > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it is
as an
> > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile?  Wondering if there
are
> > >> any
> > >>>>> unexpected gotchas w/ this.
> > >>>>>
> > >>>> I was discussing my questions w/ Christopher today and he mentioned
> an
> > >>>> experiment that I thought was interesting.   What is the random
seek
> > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs
> > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
> > >>>>
> > >>>>
> > >>>>>
> > >>>>>
> > >>>>>> encryption scheme. I don't know the details of "more advanced
key
> > >>>> stores",
> > >>>>>> but it seems like we could potentially take any custom
> > >> implementation
> > >>>> and
> > >>>>>> map it to a KeyProvider [1]. I could also envision table
level
> > >>>> encryption
> > >>>>>> being implementable via zones, but probably not down to
the column
> > >>>> family
> > >>>>>> level.
> > >>>>>>
> > >>>>>> [1]
> > >>>>>>
> > >>>>>>
> > >>
> >
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
> > >>>>>>
> > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<afuchs@apache.org
> > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>> 
wrote:
> > >>>>>>> Responses inline.
> > >>>>>>>
> > >>>>>>> Adam
> > >>>>>>>
> > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<ctubbsii@apache.org
> > >>>> <javascript:_e(%7B%7D,'cvml','ctubbsii@apache.org');>>
 wrote:
> > >>>>>>>> 1. I'm not sure I'd call an incomplete solution
'great'. What it
> > >>>> does
> > >>>>>> is
> > >>>>>>>> provide partial encryption-at-rest protection (unless
you're
> > >>> running
> > >>>>>>>> without walogs, and have good integration with
some external
> > >>> secure
> > >>>>>> key
> > >>>>>>>> management faculty, and then it's probably fine).
> > >>>>>>> The only thing that doesn't get encrypted is a temporary
WAL
> > >>> recovery
> > >>>>>> file.
> > >>>>>>> That is a project we should take on, but it does not
imply that
> > >> the
> > >>>>>>> existing features are not valuable. With HDFS encryption
options
> > >>> this
> > >>>>>> would
> > >>>>>>> now be a much easier project to take on. Also, the
users I know
> > >> that
> > >>>> use
> > >>>>>>> encryption at rest do so with a more secure key store
than the
> > >>>> default.
> > >>>>>>>> 2. I'm concerned that anybody using Accumulo's
E-A-R don't
> > >>>> necessarily
> > >>>>>>>> realize its current shortcomings, or its lack of
upstream
> > >>>> maintenance
> > >>>>>>>> support (which it has not been receiving). It may
be the case
> > >> that
> > >>>>>> these
> > >>>>>>>> users have support from an intermediary, and do
understand the
> > >>>>>>>> shortcomings... I don't know, but it's a concern.
> > >>>>>>> Anybody that creates a secure system has to analyze
the security
> > >> of
> > >>>> the
> > >>>>>>> system as a whole. Accumulo's encryption at rest is
one part of
> > >> the
> > >>>>>>> solution. Taking away the tool without providing an
alternative
> > >> does
> > >>>>>>> nothing to improve the security of systems built on
Accumulo.
> > >>>>>>>
> > >>>>>>>> 3. Correction: it has been an explicitly experimental
feature
> > >> and
> > >>> an
> > >>>>>>>> incomplete one, which hasn't really been touched
in two years,
> > >> and
> > >>>> has
> > >>>>>>> been
> > >>>>>>>> explicitly excluded by the community for being
public API
> > >> because
> > >>> of
> > >>>>>> its
> > >>>>>>>> incompleteness. Age doesn't determine public API
status. The
> > >>>> community
> > >>>>>>> does.
> > >>>>>>>
> > >>>>>>> People are using it, so we have to consider the implications
of
> > >>>> whatever
> > >>>>>>> changes we make and weigh against the benefits. I believe
the
> last
> > >>> bug
> > >>>>>> fix
> > >>>>>>> was done this year, so I would argue it is being maintained.
> > >> Changes
> > >>>> to
> > >>>>>> our
> > >>>>>>> encryption at rest implementation will have consequences
for
> those
> > >>>>>> users.
> > >>>>>>> There had better be a clear benefit if we break their
systems.
> > >>>>>>>
> > >>>>>>>> 4. Has Accumulo's been evaluated for security and
performance?
> > >> By
> > >>>>>> whom?
> > >>>>>>> Is
> > >>>>>>>> it published?
> > >>>>>>> Yes, there have been several talks at meetups and conferences
> that
> > >>>>>> discuss
> > >>>>>>> the security and performance of the current solution.
> > >>>>>>>
> > >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<afuchs@apache.org
> > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>> 
wrote:
> > >>>>>>>>> There's another way to look at the state of
Accumulo's
> > >>> encryption
> > >>>> at
> > >>>>>>> rest:
> > >>>>>>>>> 1. Encryption at rest works great for what
it does, and the
> > >> code
> > >>>>>> being
> > >>>>>>> "at
> > >>>>>>>>> rest" isn't necessarily a problem
> > >>>>>>>>> 2. Several organizations are using Accumulo's
encryption at
> > >> rest
> > >>>>>>>>> effectively in operations
> > >>>>>>>>> 3. Encryption at rest has been a supported
configuration
> > >> option
> > >>>> for
> > >>>>>>> over
> > >>>>>>>>> two years with established plugin interfaces,
and therefore it
> > >>>>>> should
> > >>>>>>> be
> > >>>>>>>>> considered part of the public API
> > >>>>>>>>> 4. Upstream alternatives (to my knowledge)
have not been
> > >>> analyzed
> > >>>>>> for
> > >>>>>>>>> performance or security
> > >>>>>>>>>
> > >>>>>>>>> The given option #2 would at least require
an analysis of
> > >>>>>> alternatives,
> > >>>>>>> and
> > >>>>>>>>> we would have to decide what to do about backwards
> > >> compatibility
> > >>>> for
> > >>>>>>> users
> > >>>>>>>>> using custom key stores and encryption strategies
that may or
> > >>> may
> > >>>>>> not
> > >>>>>>> be
> > >>>>>>>>> supported by upstream alternatives.
> > >>>>>>>>>
> > >>>>>>>>> As far as option #1 goes, I can get behind
encouraging people
> > >> to
> > >>>>>> take
> > >>>>>>> up
> > >>>>>>>>> projects to improve Accumulo's encryption.
I think we're
> > >> already
> > >>>>>> going
> > >>>>>>> down
> > >>>>>>>>> this path, but without having identified resources
to do the
> > >>>>>>> improvements.
> > >>>>>>>>> Any volunteers?
> > >>>>>>>>>
> > >>>>>>>>> Adam
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM, William Slacum<
> > >>>> wslacum@gmail.com<javascript:_e(%7B%7D,'cvml','wslacum@gmail.com
> ');>>
> > >>>>>>> wrote:
> > >>>>>>>>>> So I've been looking into options for providing
encryption
> > >> at
> > >>>>>> rest,
> > >>>>>>> and
> > >>>>>>>>> it
> > >>>>>>>>>> seems like what Accumulo has is abandonware
from a project
> > >>>>>>> perspective.
> > >>>>>>>>>> There is no official documentation on how
to perform
> > >>> encryption
> > >>>> at
> > >>>>>>> rest,
> > >>>>>>>>>> and the best information from its status
comes from year (or
> > >>>>>> greater)
> > >>>>>>> old
> > >>>>>>>>>> ticket comments about how the feature is
still experimental.
> > >>>>>> Recently
> > >>>>>>>>> there
> > >>>>>>>>>> was a talk that described using HDFS encryption
zones as an
> > >>>>>>> alternative.
> > >>>>>>>>>>  From my perspective, this is what I see
as the current
> > >>>> situation:
> > >>>>>>>>>> 1- Encryption at rest in Accumulo isn't
actively being
> > >> worked
> > >>> on
> > >>>>>>>>>> 2- Encryption at rest in Accumulo isn't
part of the public
> > >> API
> > >>>> or
> > >>>>>>>>> marketed
> > >>>>>>>>>> capabilities
> > >>>>>>>>>> 3- Documentation for what does exist is
scattered throughout
> > >>>> Jira
> > >>>>>>>>> comments
> > >>>>>>>>>> or presentations
> > >>>>>>>>>> 4- A viable alternative exists that appears
to have feature
> > >>>>>> parity in
> > >>>>>>>>> HDFS
> > >>>>>>>>>> encryption
> > >>>>>>>>>> 5- HBase has finer grained encryption capabilities
that
> > >> extend
> > >>>>>> beyond
> > >>>>>>>>> what
> > >>>>>>>>>> HDFS provides
> > >>>>>>>>>>
> > >>>>>>>>>> Moving forward, what's the consensus for
supporting this
> > >>>> feature?
> > >>>>>>>>>> Personally, I see two options:
> > >>>>>>>>>>
> > >>>>>>>>>> 1- Start going down a path to bring the
feature into the
> > >>>> forefront
> > >>>>>>> and
> > >>>>>>>>>> start providing feature parity with HBase
> > >>>>>>>>>>
> > >>>>>>>>>> or
> > >>>>>>>>>>
> > >>>>>>>>>> 2- Remove the feature and place emphasis
on upstream
> > >>> encryption
> > >>>>>>> offerings
> > >>>>>>>>>> Any input is welcomed&  appreciated!
> > >>>>>>>>>>
> > >>>>>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message