accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Drob <md...@apache.org>
Subject Re: [DISCUSS] What to do about encryption at rest?
Date Thu, 05 Nov 2015 18:11:35 GMT
I think you have misidentified the two camps. There is a camp that believes
we should phase out the code in favour of the HDFS encryption, and a camp
that believes the code is sufficiently mature. I don't think there is a
group that is interested in improving the state of things.

On Thu, Nov 5, 2015 at 12:02 PM, Christopher <ctubbsii@apache.org> wrote:

> JIRAs are fine, but I thought this thread was mostly addressing the fact
> that there doesn't seem to be a sustained interest in actually working on
> any of the JIRAs addressing that area of code. Am I wrong? Is there
> willingness from anybody to expend effort on this code? Even if not, we can
> still make JIRAs, but they'll probably just be ignored. So, the question
> for me is: which JIRAs should we make? Are we going to pursue phasing out
> the code, or pursue improving it? Those are very different JIRA text.
>
> On Thu, Nov 5, 2015 at 12:22 PM Mike Drob <mdrob@apache.org> wrote:
>
> > Can we file some JIRAs to build out a suite to test this and run the
> > necessary tests?
> >
> > On Thu, Nov 5, 2015 at 11:17 AM, Christopher <ctubbsii@apache.org>
> wrote:
> >
> > > My main concern using HDFS encryption vs. built-in Accumulo
> > implementation
> > > is possibly performance with respect to seeks. If we encrypt our
> indexed
> > > blocks independently (as we do now), I suspect our seeks would be more
> > > performant than relying on HDFS encryption, whose encrypted blocks may
> > not
> > > fall on our index boundaries. If this is a small difference, it might
> > still
> > > be worth it for convenience and simpler maintenance, but I suspect the
> > > difference will be somewhat substantial.
> > >
> > > On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <josh.elser@gmail.com>
> wrote:
> > >
> > > > +1 I think this is the right step. My hunch is that some of the
> common
> > > > data access patterns that we have in Accumulo (over HBase) is that
> the
> > > > per-colfam encryption isn't quick as common a design pattern as it is
> > > > for HBase (please tell me I'm wrong if anyone disagrees -- this is
> > > > mostly a gut reaction). I think our users would likely benefit more
> > from
> > > > a per-namespace/table encryption control like you suggest.
> > > >
> > > > Implementing RFile encryption at HDFS level (e.g. tie a specific
> > > > zone/key for a table) is probably straightforward. Changing the
> > > > TServer's WAL use would likely be trickier to get right (a tserver
> > would
> > > > have multiple WALs, one for each unique zone/key from Tablet it
> happens
> > > > to host). Maybe worrying about that is getting ahead of things --
> just
> > > > thought about it and figured I'd mention it :)
> > > >
> > > > William Slacum wrote:
> > > > > Yup, #2. I also don't know if it's worth the effort for that
> specific
> > > > > feature. It might be easier to add something like per-namespace
> > and/or
> > > > > per-table encryption, then define common access patterns for
> > > applications
> > > > > that want to use multiple keys for encryption.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afuchs@apache.org>
> > wrote:
> > > > >
> > > > >> Bill,
> > > > >>
> > > > >> Do you envision one of the following as the driver behind
> > > finer-grained
> > > > >> encryption?:
> > > > >>
> > > > >> 1. We would only encrypt certain columns in order to get better
> > > > >> performance;
> > > > >>
> > > > >> 2. We would use different keys on different columns in order
to
> > revoke
> > > > >> access to a column via the key store;
> > > > >>
> > > > >> 3. We would only give a tablet server access to a subset of
> columns
> > at
> > > > any
> > > > >> given time in order to protect something, and figure out what
to
> do
> > > for
> > > > >> compactions, etc.;
> > > > >>
> > > > >> 4. Something entirely different...
> > > > >>
> > > > >> Seems like thing #2 might have merit, but I'm not sure it's worth
> > the
> > > > >> effort.
> > > > >>
> > > > >> Adam
> > > > >> On Nov 4, 2015 7:38 PM, "William Slacum"<wslacum@gmail.com>
> wrote:
> > > > >>
> > > > >>> @Adam, column family level encryption can be useful for
> > multi-tenant
> > > > >>> environments, and I think it maps pretty well to the document
> > > > >>> partitioning/sharding/wikisearch style tables. Things are
> trickier
> > in
> > > > >>> Accumulo than in HBase since there isn't a 1:1 mapping between
> > column
> > > > >>> families and files. The built in RFile encryption scheme
seems
> > better
> > > > >>> suited to this.
> > > > >>>
> > > > >>> @Christopher&  Keith, it's something we can evaluate.
Is there a
> > good
> > > > >> test
> > > > >>> harness for just writing an RFile, opening a reader to it,
and
> just
> > > > >> poking
> > > > >>> around? I was looking at the constructors and they didn't
seem
> > > > >>> straightforward enough for me to comprehend them within a
few
> > > seconds.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<keith@deenlo.com
> > > > >>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>
 wrote:
> > > > >>>
> > > > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<keith@deenlo.com
> > > > >>>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>
 wrote:
> > > > >>>>
> > > > >>>>>
> > > > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<
> > wslacum@gmail.com
> > > > >>>> <javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>>
 wrote:
> > > > >>>>>> Is "the code being 'at rest'" you making a funny
about active
> > > > >>>> development?
> > > > >>>>>> Making sure I haven't lost my ability to get
jokes :)
> > > > >>>>>>
> > > > >>>>>> I see two reasons why the code would be inactive:
the feature
> is
> > > > >> good
> > > > >>>>>> enough as is or it's not interesting enough to
attract
> > attention.
> > > > >>>>>> Considering it's not public API, there are no
discussions to
> > bring
> > > > >>> into
> > > > >>>>>> the
> > > > >>>>>> public API, and there's no effort to document
how to use it,
> my
> > > > >>>> intuition
> > > > >>>>>> tells me that there isn't enough interest in
it from a project
> > > > >>>>>> perspective.
> > > > >>>>>>
> > > > >>>>>>  From a user perspective, I've been getting asked
about it
> when
> > I
> > > > >> work
> > > > >>>> with
> > > > >>>>>> Accumulo users. My recommendation, exclusively,
is to use HDFS
> > > > >>>> encryption
> > > > >>>>>> because I can go to Hadoop's website and find
documentation on
> > it.
> > > > >>> When
> > > > >>>> I
> > > > >>>>>> go to find documentation on Accumulo's offerings,
any
> usability
> > > > >>>>>> information
> > > > >>>>>> comes from vendor SlideShares. Most mentions
of the feature on
> > > > >>> official
> > > > >>>>>> Apache Accumulo channels echo Christopher's sentiments
on the
> > > > >> feature
> > > > >>>>>> being
> > > > >>>>>> experimental and not being officially recommended
for use.
> > > > >>>>>>
> > > > >>>>>> I wouldn't want to rip out the feature first
and then figure
> > > things
> > > > >>> out
> > > > >>>>>> later. Sean already alluded to it, but a roadmap
should
> contain
> > > > >>>> something
> > > > >>>>>> (tool or documentation) to help users migrate
if we go down
> that
> > > > >>> route.
> > > > >>>>>> What I'm trying to figure out is, when the question
of "How
> do I
> > > do
> > > > >>>>>> encryption at rest in Accumulo?" comes up, what
is our
> > community's
> > > > >>>> answer?
> > > > >>>>>> If we went down the route of using HDFS encryption
zones, can
> we
> > > > >> offer
> > > > >>>> the
> > > > >>>>>> same features? At the very least, we'd be offering
the same
> > > > >>>> database-level
> > > > >>>>> Where does the decryption happen with DFS, is it
in the DFS
> > client?
> > > > >> If
> > > > >>>>> so, using HDFS level encryption seems to offer the
same
> > > > >>> functionality???
> > > > >>>>> Has anyone written a tool that takes an
> > > > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites
it is as
> > an
> > > > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile?  Wondering
if there
> > are
> > > > >> any
> > > > >>>>> unexpected gotchas w/ this.
> > > > >>>>>
> > > > >>>> I was discussing my questions w/ Christopher today and
he
> > mentioned
> > > an
> > > > >>>> experiment that I thought was interesting.   What is
the random
> > seek
> > > > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile
vs
> > > > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
> > > > >>>>
> > > > >>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>> encryption scheme. I don't know the details of
"more advanced
> > key
> > > > >>>> stores",
> > > > >>>>>> but it seems like we could potentially take any
custom
> > > > >> implementation
> > > > >>>> and
> > > > >>>>>> map it to a KeyProvider [1]. I could also envision
table level
> > > > >>>> encryption
> > > > >>>>>> being implementable via zones, but probably not
down to the
> > column
> > > > >>>> family
> > > > >>>>>> level.
> > > > >>>>>>
> > > > >>>>>> [1]
> > > > >>>>>>
> > > > >>>>>>
> > > > >>
> > > >
> > >
> >
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
> > > > >>>>>>
> > > > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<afuchs@apache.org
> > > > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>
 wrote:
> > > > >>>>>>> Responses inline.
> > > > >>>>>>>
> > > > >>>>>>> Adam
> > > > >>>>>>>
> > > > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<ctubbsii@apache.org
> > > > >>>> <javascript:_e(%7B%7D,'cvml','ctubbsii@apache.org');>>
 wrote:
> > > > >>>>>>>> 1. I'm not sure I'd call an incomplete
solution 'great'.
> What
> > it
> > > > >>>> does
> > > > >>>>>> is
> > > > >>>>>>>> provide partial encryption-at-rest protection
(unless you're
> > > > >>> running
> > > > >>>>>>>> without walogs, and have good integration
with some external
> > > > >>> secure
> > > > >>>>>> key
> > > > >>>>>>>> management faculty, and then it's probably
fine).
> > > > >>>>>>> The only thing that doesn't get encrypted
is a temporary WAL
> > > > >>> recovery
> > > > >>>>>> file.
> > > > >>>>>>> That is a project we should take on, but
it does not imply
> that
> > > > >> the
> > > > >>>>>>> existing features are not valuable. With
HDFS encryption
> > options
> > > > >>> this
> > > > >>>>>> would
> > > > >>>>>>> now be a much easier project to take on.
Also, the users I
> know
> > > > >> that
> > > > >>>> use
> > > > >>>>>>> encryption at rest do so with a more secure
key store than
> the
> > > > >>>> default.
> > > > >>>>>>>> 2. I'm concerned that anybody using Accumulo's
E-A-R don't
> > > > >>>> necessarily
> > > > >>>>>>>> realize its current shortcomings, or
its lack of upstream
> > > > >>>> maintenance
> > > > >>>>>>>> support (which it has not been receiving).
It may be the
> case
> > > > >> that
> > > > >>>>>> these
> > > > >>>>>>>> users have support from an intermediary,
and do understand
> the
> > > > >>>>>>>> shortcomings... I don't know, but it's
a concern.
> > > > >>>>>>> Anybody that creates a secure system has
to analyze the
> > security
> > > > >> of
> > > > >>>> the
> > > > >>>>>>> system as a whole. Accumulo's encryption
at rest is one part
> of
> > > > >> the
> > > > >>>>>>> solution. Taking away the tool without providing
an
> alternative
> > > > >> does
> > > > >>>>>>> nothing to improve the security of systems
built on Accumulo.
> > > > >>>>>>>
> > > > >>>>>>>> 3. Correction: it has been an explicitly
experimental
> feature
> > > > >> and
> > > > >>> an
> > > > >>>>>>>> incomplete one, which hasn't really been
touched in two
> years,
> > > > >> and
> > > > >>>> has
> > > > >>>>>>> been
> > > > >>>>>>>> explicitly excluded by the community
for being public API
> > > > >> because
> > > > >>> of
> > > > >>>>>> its
> > > > >>>>>>>> incompleteness. Age doesn't determine
public API status. The
> > > > >>>> community
> > > > >>>>>>> does.
> > > > >>>>>>>
> > > > >>>>>>> People are using it, so we have to consider
the implications
> of
> > > > >>>> whatever
> > > > >>>>>>> changes we make and weigh against the benefits.
I believe the
> > > last
> > > > >>> bug
> > > > >>>>>> fix
> > > > >>>>>>> was done this year, so I would argue it is
being maintained.
> > > > >> Changes
> > > > >>>> to
> > > > >>>>>> our
> > > > >>>>>>> encryption at rest implementation will have
consequences for
> > > those
> > > > >>>>>> users.
> > > > >>>>>>> There had better be a clear benefit if we
break their
> systems.
> > > > >>>>>>>
> > > > >>>>>>>> 4. Has Accumulo's been evaluated for
security and
> performance?
> > > > >> By
> > > > >>>>>> whom?
> > > > >>>>>>> Is
> > > > >>>>>>>> it published?
> > > > >>>>>>> Yes, there have been several talks at meetups
and conferences
> > > that
> > > > >>>>>> discuss
> > > > >>>>>>> the security and performance of the current
solution.
> > > > >>>>>>>
> > > > >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<afuchs@apache.org
> > > > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>
 wrote:
> > > > >>>>>>>>> There's another way to look at the
state of Accumulo's
> > > > >>> encryption
> > > > >>>> at
> > > > >>>>>>> rest:
> > > > >>>>>>>>> 1. Encryption at rest works great
for what it does, and the
> > > > >> code
> > > > >>>>>> being
> > > > >>>>>>> "at
> > > > >>>>>>>>> rest" isn't necessarily a problem
> > > > >>>>>>>>> 2. Several organizations are using
Accumulo's encryption at
> > > > >> rest
> > > > >>>>>>>>> effectively in operations
> > > > >>>>>>>>> 3. Encryption at rest has been a
supported configuration
> > > > >> option
> > > > >>>> for
> > > > >>>>>>> over
> > > > >>>>>>>>> two years with established plugin
interfaces, and therefore
> > it
> > > > >>>>>> should
> > > > >>>>>>> be
> > > > >>>>>>>>> considered part of the public API
> > > > >>>>>>>>> 4. Upstream alternatives (to my knowledge)
have not been
> > > > >>> analyzed
> > > > >>>>>> for
> > > > >>>>>>>>> performance or security
> > > > >>>>>>>>>
> > > > >>>>>>>>> The given option #2 would at least
require an analysis of
> > > > >>>>>> alternatives,
> > > > >>>>>>> and
> > > > >>>>>>>>> we would have to decide what to do
about backwards
> > > > >> compatibility
> > > > >>>> for
> > > > >>>>>>> users
> > > > >>>>>>>>> using custom key stores and encryption
strategies that may
> or
> > > > >>> may
> > > > >>>>>> not
> > > > >>>>>>> be
> > > > >>>>>>>>> supported by upstream alternatives.
> > > > >>>>>>>>>
> > > > >>>>>>>>> As far as option #1 goes, I can get
behind encouraging
> people
> > > > >> to
> > > > >>>>>> take
> > > > >>>>>>> up
> > > > >>>>>>>>> projects to improve Accumulo's encryption.
I think we're
> > > > >> already
> > > > >>>>>> going
> > > > >>>>>>> down
> > > > >>>>>>>>> this path, but without having identified
resources to do
> the
> > > > >>>>>>> improvements.
> > > > >>>>>>>>> Any volunteers?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Adam
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM,
William Slacum<
> > > > >>>> wslacum@gmail.com<javascript:_e(%7B%7D,'cvml','
> wslacum@gmail.com
> > > ');>>
> > > > >>>>>>> wrote:
> > > > >>>>>>>>>> So I've been looking into options
for providing encryption
> > > > >> at
> > > > >>>>>> rest,
> > > > >>>>>>> and
> > > > >>>>>>>>> it
> > > > >>>>>>>>>> seems like what Accumulo has
is abandonware from a project
> > > > >>>>>>> perspective.
> > > > >>>>>>>>>> There is no official documentation
on how to perform
> > > > >>> encryption
> > > > >>>> at
> > > > >>>>>>> rest,
> > > > >>>>>>>>>> and the best information from
its status comes from year
> (or
> > > > >>>>>> greater)
> > > > >>>>>>> old
> > > > >>>>>>>>>> ticket comments about how the
feature is still
> experimental.
> > > > >>>>>> Recently
> > > > >>>>>>>>> there
> > > > >>>>>>>>>> was a talk that described using
HDFS encryption zones as
> an
> > > > >>>>>>> alternative.
> > > > >>>>>>>>>>  From my perspective, this is
what I see as the current
> > > > >>>> situation:
> > > > >>>>>>>>>> 1- Encryption at rest in Accumulo
isn't actively being
> > > > >> worked
> > > > >>> on
> > > > >>>>>>>>>> 2- Encryption at rest in Accumulo
isn't part of the public
> > > > >> API
> > > > >>>> or
> > > > >>>>>>>>> marketed
> > > > >>>>>>>>>> capabilities
> > > > >>>>>>>>>> 3- Documentation for what does
exist is scattered
> throughout
> > > > >>>> Jira
> > > > >>>>>>>>> comments
> > > > >>>>>>>>>> or presentations
> > > > >>>>>>>>>> 4- A viable alternative exists
that appears to have
> feature
> > > > >>>>>> parity in
> > > > >>>>>>>>> HDFS
> > > > >>>>>>>>>> encryption
> > > > >>>>>>>>>> 5- HBase has finer grained encryption
capabilities that
> > > > >> extend
> > > > >>>>>> beyond
> > > > >>>>>>>>> what
> > > > >>>>>>>>>> HDFS provides
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Moving forward, what's the consensus
for supporting this
> > > > >>>> feature?
> > > > >>>>>>>>>> Personally, I see two options:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> 1- Start going down a path to
bring the feature into the
> > > > >>>> forefront
> > > > >>>>>>> and
> > > > >>>>>>>>>> start providing feature parity
with HBase
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> or
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> 2- Remove the feature and place
emphasis on upstream
> > > > >>> encryption
> > > > >>>>>>> offerings
> > > > >>>>>>>>>> Any input is welcomed&  appreciated!
> > > > >>>>>>>>>>
> > > > >>>>>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message