accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: [DISCUSS] What to do about encryption at rest?
Date Thu, 05 Nov 2015 18:02:19 GMT
JIRAs are fine, but I thought this thread was mostly addressing the fact
that there doesn't seem to be a sustained interest in actually working on
any of the JIRAs addressing that area of code. Am I wrong? Is there
willingness from anybody to expend effort on this code? Even if not, we can
still make JIRAs, but they'll probably just be ignored. So, the question
for me is: which JIRAs should we make? Are we going to pursue phasing out
the code, or pursue improving it? Those are very different JIRA text.

On Thu, Nov 5, 2015 at 12:22 PM Mike Drob <mdrob@apache.org> wrote:

> Can we file some JIRAs to build out a suite to test this and run the
> necessary tests?
>
> On Thu, Nov 5, 2015 at 11:17 AM, Christopher <ctubbsii@apache.org> wrote:
>
> > My main concern using HDFS encryption vs. built-in Accumulo
> implementation
> > is possibly performance with respect to seeks. If we encrypt our indexed
> > blocks independently (as we do now), I suspect our seeks would be more
> > performant than relying on HDFS encryption, whose encrypted blocks may
> not
> > fall on our index boundaries. If this is a small difference, it might
> still
> > be worth it for convenience and simpler maintenance, but I suspect the
> > difference will be somewhat substantial.
> >
> > On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <josh.elser@gmail.com> wrote:
> >
> > > +1 I think this is the right step. My hunch is that some of the common
> > > data access patterns that we have in Accumulo (over HBase) is that the
> > > per-colfam encryption isn't quick as common a design pattern as it is
> > > for HBase (please tell me I'm wrong if anyone disagrees -- this is
> > > mostly a gut reaction). I think our users would likely benefit more
> from
> > > a per-namespace/table encryption control like you suggest.
> > >
> > > Implementing RFile encryption at HDFS level (e.g. tie a specific
> > > zone/key for a table) is probably straightforward. Changing the
> > > TServer's WAL use would likely be trickier to get right (a tserver
> would
> > > have multiple WALs, one for each unique zone/key from Tablet it happens
> > > to host). Maybe worrying about that is getting ahead of things -- just
> > > thought about it and figured I'd mention it :)
> > >
> > > William Slacum wrote:
> > > > Yup, #2. I also don't know if it's worth the effort for that specific
> > > > feature. It might be easier to add something like per-namespace
> and/or
> > > > per-table encryption, then define common access patterns for
> > applications
> > > > that want to use multiple keys for encryption.
> > > >
> > > >
> > > >
> > > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afuchs@apache.org>
> wrote:
> > > >
> > > >> Bill,
> > > >>
> > > >> Do you envision one of the following as the driver behind
> > finer-grained
> > > >> encryption?:
> > > >>
> > > >> 1. We would only encrypt certain columns in order to get better
> > > >> performance;
> > > >>
> > > >> 2. We would use different keys on different columns in order to
> revoke
> > > >> access to a column via the key store;
> > > >>
> > > >> 3. We would only give a tablet server access to a subset of columns
> at
> > > any
> > > >> given time in order to protect something, and figure out what to do
> > for
> > > >> compactions, etc.;
> > > >>
> > > >> 4. Something entirely different...
> > > >>
> > > >> Seems like thing #2 might have merit, but I'm not sure it's worth
> the
> > > >> effort.
> > > >>
> > > >> Adam
> > > >> On Nov 4, 2015 7:38 PM, "William Slacum"<wslacum@gmail.com>
 wrote:
> > > >>
> > > >>> @Adam, column family level encryption can be useful for
> multi-tenant
> > > >>> environments, and I think it maps pretty well to the document
> > > >>> partitioning/sharding/wikisearch style tables. Things are trickier
> in
> > > >>> Accumulo than in HBase since there isn't a 1:1 mapping between
> column
> > > >>> families and files. The built in RFile encryption scheme seems
> better
> > > >>> suited to this.
> > > >>>
> > > >>> @Christopher&  Keith, it's something we can evaluate. Is there
a
> good
> > > >> test
> > > >>> harness for just writing an RFile, opening a reader to it, and
just
> > > >> poking
> > > >>> around? I was looking at the constructors and they didn't seem
> > > >>> straightforward enough for me to comprehend them within a few
> > seconds.
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<keith@deenlo.com
> > > >>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>> 
wrote:
> > > >>>
> > > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<keith@deenlo.com
> > > >>>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>
 wrote:
> > > >>>>
> > > >>>>>
> > > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<
> wslacum@gmail.com
> > > >>>> <javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>>
 wrote:
> > > >>>>>> Is "the code being 'at rest'" you making a funny about
active
> > > >>>> development?
> > > >>>>>> Making sure I haven't lost my ability to get jokes
:)
> > > >>>>>>
> > > >>>>>> I see two reasons why the code would be inactive:
the feature is
> > > >> good
> > > >>>>>> enough as is or it's not interesting enough to attract
> attention.
> > > >>>>>> Considering it's not public API, there are no discussions
to
> bring
> > > >>> into
> > > >>>>>> the
> > > >>>>>> public API, and there's no effort to document how
to use it, my
> > > >>>> intuition
> > > >>>>>> tells me that there isn't enough interest in it from
a project
> > > >>>>>> perspective.
> > > >>>>>>
> > > >>>>>>  From a user perspective, I've been getting asked
about it when
> I
> > > >> work
> > > >>>> with
> > > >>>>>> Accumulo users. My recommendation, exclusively, is
to use HDFS
> > > >>>> encryption
> > > >>>>>> because I can go to Hadoop's website and find documentation
on
> it.
> > > >>> When
> > > >>>> I
> > > >>>>>> go to find documentation on Accumulo's offerings,
any usability
> > > >>>>>> information
> > > >>>>>> comes from vendor SlideShares. Most mentions of the
feature on
> > > >>> official
> > > >>>>>> Apache Accumulo channels echo Christopher's sentiments
on the
> > > >> feature
> > > >>>>>> being
> > > >>>>>> experimental and not being officially recommended
for use.
> > > >>>>>>
> > > >>>>>> I wouldn't want to rip out the feature first and then
figure
> > things
> > > >>> out
> > > >>>>>> later. Sean already alluded to it, but a roadmap should
contain
> > > >>>> something
> > > >>>>>> (tool or documentation) to help users migrate if we
go down that
> > > >>> route.
> > > >>>>>> What I'm trying to figure out is, when the question
of "How do I
> > do
> > > >>>>>> encryption at rest in Accumulo?" comes up, what is
our
> community's
> > > >>>> answer?
> > > >>>>>> If we went down the route of using HDFS encryption
zones, can we
> > > >> offer
> > > >>>> the
> > > >>>>>> same features? At the very least, we'd be offering
the same
> > > >>>> database-level
> > > >>>>> Where does the decryption happen with DFS, is it in the
DFS
> client?
> > > >> If
> > > >>>>> so, using HDFS level encryption seems to offer the same
> > > >>> functionality???
> > > >>>>> Has anyone written a tool that takes an
> > > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites
it is as
> an
> > > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile?  Wondering
if there
> are
> > > >> any
> > > >>>>> unexpected gotchas w/ this.
> > > >>>>>
> > > >>>> I was discussing my questions w/ Christopher today and he
> mentioned
> > an
> > > >>>> experiment that I thought was interesting.   What is the random
> seek
> > > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs
> > > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
> > > >>>>
> > > >>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>> encryption scheme. I don't know the details of "more
advanced
> key
> > > >>>> stores",
> > > >>>>>> but it seems like we could potentially take any custom
> > > >> implementation
> > > >>>> and
> > > >>>>>> map it to a KeyProvider [1]. I could also envision
table level
> > > >>>> encryption
> > > >>>>>> being implementable via zones, but probably not down
to the
> column
> > > >>>> family
> > > >>>>>> level.
> > > >>>>>>
> > > >>>>>> [1]
> > > >>>>>>
> > > >>>>>>
> > > >>
> > >
> >
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
> > > >>>>>>
> > > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<afuchs@apache.org
> > > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>
 wrote:
> > > >>>>>>> Responses inline.
> > > >>>>>>>
> > > >>>>>>> Adam
> > > >>>>>>>
> > > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<ctubbsii@apache.org
> > > >>>> <javascript:_e(%7B%7D,'cvml','ctubbsii@apache.org');>>
 wrote:
> > > >>>>>>>> 1. I'm not sure I'd call an incomplete solution
'great'. What
> it
> > > >>>> does
> > > >>>>>> is
> > > >>>>>>>> provide partial encryption-at-rest protection
(unless you're
> > > >>> running
> > > >>>>>>>> without walogs, and have good integration
with some external
> > > >>> secure
> > > >>>>>> key
> > > >>>>>>>> management faculty, and then it's probably
fine).
> > > >>>>>>> The only thing that doesn't get encrypted is a
temporary WAL
> > > >>> recovery
> > > >>>>>> file.
> > > >>>>>>> That is a project we should take on, but it does
not imply that
> > > >> the
> > > >>>>>>> existing features are not valuable. With HDFS
encryption
> options
> > > >>> this
> > > >>>>>> would
> > > >>>>>>> now be a much easier project to take on. Also,
the users I know
> > > >> that
> > > >>>> use
> > > >>>>>>> encryption at rest do so with a more secure key
store than the
> > > >>>> default.
> > > >>>>>>>> 2. I'm concerned that anybody using Accumulo's
E-A-R don't
> > > >>>> necessarily
> > > >>>>>>>> realize its current shortcomings, or its lack
of upstream
> > > >>>> maintenance
> > > >>>>>>>> support (which it has not been receiving).
It may be the case
> > > >> that
> > > >>>>>> these
> > > >>>>>>>> users have support from an intermediary, and
do understand the
> > > >>>>>>>> shortcomings... I don't know, but it's a concern.
> > > >>>>>>> Anybody that creates a secure system has to analyze
the
> security
> > > >> of
> > > >>>> the
> > > >>>>>>> system as a whole. Accumulo's encryption at rest
is one part of
> > > >> the
> > > >>>>>>> solution. Taking away the tool without providing
an alternative
> > > >> does
> > > >>>>>>> nothing to improve the security of systems built
on Accumulo.
> > > >>>>>>>
> > > >>>>>>>> 3. Correction: it has been an explicitly experimental
feature
> > > >> and
> > > >>> an
> > > >>>>>>>> incomplete one, which hasn't really been touched
in two years,
> > > >> and
> > > >>>> has
> > > >>>>>>> been
> > > >>>>>>>> explicitly excluded by the community for being
public API
> > > >> because
> > > >>> of
> > > >>>>>> its
> > > >>>>>>>> incompleteness. Age doesn't determine public
API status. The
> > > >>>> community
> > > >>>>>>> does.
> > > >>>>>>>
> > > >>>>>>> People are using it, so we have to consider the
implications of
> > > >>>> whatever
> > > >>>>>>> changes we make and weigh against the benefits.
I believe the
> > last
> > > >>> bug
> > > >>>>>> fix
> > > >>>>>>> was done this year, so I would argue it is being
maintained.
> > > >> Changes
> > > >>>> to
> > > >>>>>> our
> > > >>>>>>> encryption at rest implementation will have consequences
for
> > those
> > > >>>>>> users.
> > > >>>>>>> There had better be a clear benefit if we break
their systems.
> > > >>>>>>>
> > > >>>>>>>> 4. Has Accumulo's been evaluated for security
and performance?
> > > >> By
> > > >>>>>> whom?
> > > >>>>>>> Is
> > > >>>>>>>> it published?
> > > >>>>>>> Yes, there have been several talks at meetups
and conferences
> > that
> > > >>>>>> discuss
> > > >>>>>>> the security and performance of the current solution.
> > > >>>>>>>
> > > >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<afuchs@apache.org
> > > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>
 wrote:
> > > >>>>>>>>> There's another way to look at the state
of Accumulo's
> > > >>> encryption
> > > >>>> at
> > > >>>>>>> rest:
> > > >>>>>>>>> 1. Encryption at rest works great for
what it does, and the
> > > >> code
> > > >>>>>> being
> > > >>>>>>> "at
> > > >>>>>>>>> rest" isn't necessarily a problem
> > > >>>>>>>>> 2. Several organizations are using Accumulo's
encryption at
> > > >> rest
> > > >>>>>>>>> effectively in operations
> > > >>>>>>>>> 3. Encryption at rest has been a supported
configuration
> > > >> option
> > > >>>> for
> > > >>>>>>> over
> > > >>>>>>>>> two years with established plugin interfaces,
and therefore
> it
> > > >>>>>> should
> > > >>>>>>> be
> > > >>>>>>>>> considered part of the public API
> > > >>>>>>>>> 4. Upstream alternatives (to my knowledge)
have not been
> > > >>> analyzed
> > > >>>>>> for
> > > >>>>>>>>> performance or security
> > > >>>>>>>>>
> > > >>>>>>>>> The given option #2 would at least require
an analysis of
> > > >>>>>> alternatives,
> > > >>>>>>> and
> > > >>>>>>>>> we would have to decide what to do about
backwards
> > > >> compatibility
> > > >>>> for
> > > >>>>>>> users
> > > >>>>>>>>> using custom key stores and encryption
strategies that may or
> > > >>> may
> > > >>>>>> not
> > > >>>>>>> be
> > > >>>>>>>>> supported by upstream alternatives.
> > > >>>>>>>>>
> > > >>>>>>>>> As far as option #1 goes, I can get behind
encouraging people
> > > >> to
> > > >>>>>> take
> > > >>>>>>> up
> > > >>>>>>>>> projects to improve Accumulo's encryption.
I think we're
> > > >> already
> > > >>>>>> going
> > > >>>>>>> down
> > > >>>>>>>>> this path, but without having identified
resources to do the
> > > >>>>>>> improvements.
> > > >>>>>>>>> Any volunteers?
> > > >>>>>>>>>
> > > >>>>>>>>> Adam
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM, William
Slacum<
> > > >>>> wslacum@gmail.com<javascript:_e(%7B%7D,'cvml','wslacum@gmail.com
> > ');>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>> So I've been looking into options
for providing encryption
> > > >> at
> > > >>>>>> rest,
> > > >>>>>>> and
> > > >>>>>>>>> it
> > > >>>>>>>>>> seems like what Accumulo has is abandonware
from a project
> > > >>>>>>> perspective.
> > > >>>>>>>>>> There is no official documentation
on how to perform
> > > >>> encryption
> > > >>>> at
> > > >>>>>>> rest,
> > > >>>>>>>>>> and the best information from its
status comes from year (or
> > > >>>>>> greater)
> > > >>>>>>> old
> > > >>>>>>>>>> ticket comments about how the feature
is still experimental.
> > > >>>>>> Recently
> > > >>>>>>>>> there
> > > >>>>>>>>>> was a talk that described using HDFS
encryption zones as an
> > > >>>>>>> alternative.
> > > >>>>>>>>>>  From my perspective, this is what
I see as the current
> > > >>>> situation:
> > > >>>>>>>>>> 1- Encryption at rest in Accumulo
isn't actively being
> > > >> worked
> > > >>> on
> > > >>>>>>>>>> 2- Encryption at rest in Accumulo
isn't part of the public
> > > >> API
> > > >>>> or
> > > >>>>>>>>> marketed
> > > >>>>>>>>>> capabilities
> > > >>>>>>>>>> 3- Documentation for what does exist
is scattered throughout
> > > >>>> Jira
> > > >>>>>>>>> comments
> > > >>>>>>>>>> or presentations
> > > >>>>>>>>>> 4- A viable alternative exists that
appears to have feature
> > > >>>>>> parity in
> > > >>>>>>>>> HDFS
> > > >>>>>>>>>> encryption
> > > >>>>>>>>>> 5- HBase has finer grained encryption
capabilities that
> > > >> extend
> > > >>>>>> beyond
> > > >>>>>>>>> what
> > > >>>>>>>>>> HDFS provides
> > > >>>>>>>>>>
> > > >>>>>>>>>> Moving forward, what's the consensus
for supporting this
> > > >>>> feature?
> > > >>>>>>>>>> Personally, I see two options:
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1- Start going down a path to bring
the feature into the
> > > >>>> forefront
> > > >>>>>>> and
> > > >>>>>>>>>> start providing feature parity with
HBase
> > > >>>>>>>>>>
> > > >>>>>>>>>> or
> > > >>>>>>>>>>
> > > >>>>>>>>>> 2- Remove the feature and place emphasis
on upstream
> > > >>> encryption
> > > >>>>>>> offerings
> > > >>>>>>>>>> Any input is welcomed&  appreciated!
> > > >>>>>>>>>>
> > > >>>>>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message