accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: [DISCUSS] What to do about encryption at rest?
Date Thu, 05 Nov 2015 18:27:07 GMT
Camps two and three are the same camp, really. If we can identify a clear
roadmap (eventually via the right set of tickets), then it comes down to
whether people have energy and inclination to do the work. I don't think
the roadmap ends here.

Adam

On Thu, Nov 5, 2015 at 1:18 PM, Christopher <ctubbsii@apache.org> wrote:

> Perhaps. I had interpreted some of Adam's comments ("The only thing that
> doesn't get encrypted is a temporary WAL recovery file. That is a project
> we should take on..."), as favoring improvements to the current state of
> things. As that has also been the focus of previous conversations about the
> state of Accumulo's encryption-at-rest, I assumed that third camp also
> existed. Perhaps I was wrong.
>
> On Thu, Nov 5, 2015 at 1:11 PM Mike Drob <mdrob@apache.org> wrote:
>
> > I think you have misidentified the two camps. There is a camp that
> believes
> > we should phase out the code in favour of the HDFS encryption, and a camp
> > that believes the code is sufficiently mature. I don't think there is a
> > group that is interested in improving the state of things.
> >
> > On Thu, Nov 5, 2015 at 12:02 PM, Christopher <ctubbsii@apache.org>
> wrote:
> >
> > > JIRAs are fine, but I thought this thread was mostly addressing the
> fact
> > > that there doesn't seem to be a sustained interest in actually working
> on
> > > any of the JIRAs addressing that area of code. Am I wrong? Is there
> > > willingness from anybody to expend effort on this code? Even if not, we
> > can
> > > still make JIRAs, but they'll probably just be ignored. So, the
> question
> > > for me is: which JIRAs should we make? Are we going to pursue phasing
> out
> > > the code, or pursue improving it? Those are very different JIRA text.
> > >
> > > On Thu, Nov 5, 2015 at 12:22 PM Mike Drob <mdrob@apache.org> wrote:
> > >
> > > > Can we file some JIRAs to build out a suite to test this and run the
> > > > necessary tests?
> > > >
> > > > On Thu, Nov 5, 2015 at 11:17 AM, Christopher <ctubbsii@apache.org>
> > > wrote:
> > > >
> > > > > My main concern using HDFS encryption vs. built-in Accumulo
> > > > implementation
> > > > > is possibly performance with respect to seeks. If we encrypt our
> > > indexed
> > > > > blocks independently (as we do now), I suspect our seeks would be
> > more
> > > > > performant than relying on HDFS encryption, whose encrypted blocks
> > may
> > > > not
> > > > > fall on our index boundaries. If this is a small difference, it
> might
> > > > still
> > > > > be worth it for convenience and simpler maintenance, but I suspect
> > the
> > > > > difference will be somewhat substantial.
> > > > >
> > > > > On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <josh.elser@gmail.com>
> > > wrote:
> > > > >
> > > > > > +1 I think this is the right step. My hunch is that some of
the
> > > common
> > > > > > data access patterns that we have in Accumulo (over HBase) is
> that
> > > the
> > > > > > per-colfam encryption isn't quick as common a design pattern
as
> it
> > is
> > > > > > for HBase (please tell me I'm wrong if anyone disagrees -- this
> is
> > > > > > mostly a gut reaction). I think our users would likely benefit
> more
> > > > from
> > > > > > a per-namespace/table encryption control like you suggest.
> > > > > >
> > > > > > Implementing RFile encryption at HDFS level (e.g. tie a specific
> > > > > > zone/key for a table) is probably straightforward. Changing
the
> > > > > > TServer's WAL use would likely be trickier to get right (a
> tserver
> > > > would
> > > > > > have multiple WALs, one for each unique zone/key from Tablet
it
> > > happens
> > > > > > to host). Maybe worrying about that is getting ahead of things
--
> > > just
> > > > > > thought about it and figured I'd mention it :)
> > > > > >
> > > > > > William Slacum wrote:
> > > > > > > Yup, #2. I also don't know if it's worth the effort for
that
> > > specific
> > > > > > > feature. It might be easier to add something like per-namespace
> > > > and/or
> > > > > > > per-table encryption, then define common access patterns
for
> > > > > applications
> > > > > > > that want to use multiple keys for encryption.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afuchs@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > >> Bill,
> > > > > > >>
> > > > > > >> Do you envision one of the following as the driver
behind
> > > > > finer-grained
> > > > > > >> encryption?:
> > > > > > >>
> > > > > > >> 1. We would only encrypt certain columns in order to
get
> better
> > > > > > >> performance;
> > > > > > >>
> > > > > > >> 2. We would use different keys on different columns
in order
> to
> > > > revoke
> > > > > > >> access to a column via the key store;
> > > > > > >>
> > > > > > >> 3. We would only give a tablet server access to a subset
of
> > > columns
> > > > at
> > > > > > any
> > > > > > >> given time in order to protect something, and figure
out what
> to
> > > do
> > > > > for
> > > > > > >> compactions, etc.;
> > > > > > >>
> > > > > > >> 4. Something entirely different...
> > > > > > >>
> > > > > > >> Seems like thing #2 might have merit, but I'm not sure
it's
> > worth
> > > > the
> > > > > > >> effort.
> > > > > > >>
> > > > > > >> Adam
> > > > > > >> On Nov 4, 2015 7:38 PM, "William Slacum"<wslacum@gmail.com>
> > > wrote:
> > > > > > >>
> > > > > > >>> @Adam, column family level encryption can be useful
for
> > > > multi-tenant
> > > > > > >>> environments, and I think it maps pretty well to
the document
> > > > > > >>> partitioning/sharding/wikisearch style tables.
Things are
> > > trickier
> > > > in
> > > > > > >>> Accumulo than in HBase since there isn't a 1:1
mapping
> between
> > > > column
> > > > > > >>> families and files. The built in RFile encryption
scheme
> seems
> > > > better
> > > > > > >>> suited to this.
> > > > > > >>>
> > > > > > >>> @Christopher&  Keith, it's something we can
evaluate. Is
> there
> > a
> > > > good
> > > > > > >> test
> > > > > > >>> harness for just writing an RFile, opening a reader
to it,
> and
> > > just
> > > > > > >> poking
> > > > > > >>> around? I was looking at the constructors and they
didn't
> seem
> > > > > > >>> straightforward enough for me to comprehend them
within a few
> > > > > seconds.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<
> keith@deenlo.com
> > > > > > >>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>
 wrote:
> > > > > > >>>
> > > > > > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<
> keith@deenlo.com
> > > > > > >>>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>
 wrote:
> > > > > > >>>>
> > > > > > >>>>>
> > > > > > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William
Slacum<
> > > > wslacum@gmail.com
> > > > > > >>>> <javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>>
> wrote:
> > > > > > >>>>>> Is "the code being 'at rest'" you making
a funny about
> > active
> > > > > > >>>> development?
> > > > > > >>>>>> Making sure I haven't lost my ability
to get jokes :)
> > > > > > >>>>>>
> > > > > > >>>>>> I see two reasons why the code would
be inactive: the
> > feature
> > > is
> > > > > > >> good
> > > > > > >>>>>> enough as is or it's not interesting
enough to attract
> > > > attention.
> > > > > > >>>>>> Considering it's not public API, there
are no discussions
> to
> > > > bring
> > > > > > >>> into
> > > > > > >>>>>> the
> > > > > > >>>>>> public API, and there's no effort to
document how to use
> it,
> > > my
> > > > > > >>>> intuition
> > > > > > >>>>>> tells me that there isn't enough interest
in it from a
> > project
> > > > > > >>>>>> perspective.
> > > > > > >>>>>>
> > > > > > >>>>>>  From a user perspective, I've been
getting asked about it
> > > when
> > > > I
> > > > > > >> work
> > > > > > >>>> with
> > > > > > >>>>>> Accumulo users. My recommendation,
exclusively, is to use
> > HDFS
> > > > > > >>>> encryption
> > > > > > >>>>>> because I can go to Hadoop's website
and find
> documentation
> > on
> > > > it.
> > > > > > >>> When
> > > > > > >>>> I
> > > > > > >>>>>> go to find documentation on Accumulo's
offerings, any
> > > usability
> > > > > > >>>>>> information
> > > > > > >>>>>> comes from vendor SlideShares. Most
mentions of the
> feature
> > on
> > > > > > >>> official
> > > > > > >>>>>> Apache Accumulo channels echo Christopher's
sentiments on
> > the
> > > > > > >> feature
> > > > > > >>>>>> being
> > > > > > >>>>>> experimental and not being officially
recommended for use.
> > > > > > >>>>>>
> > > > > > >>>>>> I wouldn't want to rip out the feature
first and then
> figure
> > > > > things
> > > > > > >>> out
> > > > > > >>>>>> later. Sean already alluded to it,
but a roadmap should
> > > contain
> > > > > > >>>> something
> > > > > > >>>>>> (tool or documentation) to help users
migrate if we go
> down
> > > that
> > > > > > >>> route.
> > > > > > >>>>>> What I'm trying to figure out is, when
the question of
> "How
> > > do I
> > > > > do
> > > > > > >>>>>> encryption at rest in Accumulo?" comes
up, what is our
> > > > community's
> > > > > > >>>> answer?
> > > > > > >>>>>> If we went down the route of using
HDFS encryption zones,
> > can
> > > we
> > > > > > >> offer
> > > > > > >>>> the
> > > > > > >>>>>> same features? At the very least, we'd
be offering the
> same
> > > > > > >>>> database-level
> > > > > > >>>>> Where does the decryption happen with DFS,
is it in the DFS
> > > > client?
> > > > > > >> If
> > > > > > >>>>> so, using HDFS level encryption seems to
offer the same
> > > > > > >>> functionality???
> > > > > > >>>>> Has anyone written a tool that takes an
> > > > > > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile
and rewrites it
> is
> > as
> > > > an
> > > > > > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
 Wondering if
> > there
> > > > are
> > > > > > >> any
> > > > > > >>>>> unexpected gotchas w/ this.
> > > > > > >>>>>
> > > > > > >>>> I was discussing my questions w/ Christopher
today and he
> > > > mentioned
> > > > > an
> > > > > > >>>> experiment that I thought was interesting.
  What is the
> > random
> > > > seek
> > > > > > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile
vs
> > > > > > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>> encryption scheme. I don't know the
details of "more
> > advanced
> > > > key
> > > > > > >>>> stores",
> > > > > > >>>>>> but it seems like we could potentially
take any custom
> > > > > > >> implementation
> > > > > > >>>> and
> > > > > > >>>>>> map it to a KeyProvider [1]. I could
also envision table
> > level
> > > > > > >>>> encryption
> > > > > > >>>>>> being implementable via zones, but
probably not down to
> the
> > > > column
> > > > > > >>>> family
> > > > > > >>>>>> level.
> > > > > > >>>>>>
> > > > > > >>>>>> [1]
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
> > > > > > >>>>>>
> > > > > > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam
Fuchs<
> > afuchs@apache.org
> > > > > > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>
> wrote:
> > > > > > >>>>>>> Responses inline.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Adam
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<
> ctubbsii@apache.org
> > > > > > >>>> <javascript:_e(%7B%7D,'cvml','ctubbsii@apache.org');>>
> > wrote:
> > > > > > >>>>>>>> 1. I'm not sure I'd call an
incomplete solution 'great'.
> > > What
> > > > it
> > > > > > >>>> does
> > > > > > >>>>>> is
> > > > > > >>>>>>>> provide partial encryption-at-rest
protection (unless
> > you're
> > > > > > >>> running
> > > > > > >>>>>>>> without walogs, and have good
integration with some
> > external
> > > > > > >>> secure
> > > > > > >>>>>> key
> > > > > > >>>>>>>> management faculty, and then
it's probably fine).
> > > > > > >>>>>>> The only thing that doesn't get
encrypted is a temporary
> > WAL
> > > > > > >>> recovery
> > > > > > >>>>>> file.
> > > > > > >>>>>>> That is a project we should take
on, but it does not
> imply
> > > that
> > > > > > >> the
> > > > > > >>>>>>> existing features are not valuable.
With HDFS encryption
> > > > options
> > > > > > >>> this
> > > > > > >>>>>> would
> > > > > > >>>>>>> now be a much easier project to
take on. Also, the users
> I
> > > know
> > > > > > >> that
> > > > > > >>>> use
> > > > > > >>>>>>> encryption at rest do so with a
more secure key store
> than
> > > the
> > > > > > >>>> default.
> > > > > > >>>>>>>> 2. I'm concerned that anybody
using Accumulo's E-A-R
> don't
> > > > > > >>>> necessarily
> > > > > > >>>>>>>> realize its current shortcomings,
or its lack of
> upstream
> > > > > > >>>> maintenance
> > > > > > >>>>>>>> support (which it has not been
receiving). It may be the
> > > case
> > > > > > >> that
> > > > > > >>>>>> these
> > > > > > >>>>>>>> users have support from an
intermediary, and do
> understand
> > > the
> > > > > > >>>>>>>> shortcomings... I don't know,
but it's a concern.
> > > > > > >>>>>>> Anybody that creates a secure system
has to analyze the
> > > > security
> > > > > > >> of
> > > > > > >>>> the
> > > > > > >>>>>>> system as a whole. Accumulo's encryption
at rest is one
> > part
> > > of
> > > > > > >> the
> > > > > > >>>>>>> solution. Taking away the tool
without providing an
> > > alternative
> > > > > > >> does
> > > > > > >>>>>>> nothing to improve the security
of systems built on
> > Accumulo.
> > > > > > >>>>>>>
> > > > > > >>>>>>>> 3. Correction: it has been
an explicitly experimental
> > > feature
> > > > > > >> and
> > > > > > >>> an
> > > > > > >>>>>>>> incomplete one, which hasn't
really been touched in two
> > > years,
> > > > > > >> and
> > > > > > >>>> has
> > > > > > >>>>>>> been
> > > > > > >>>>>>>> explicitly excluded by the
community for being public
> API
> > > > > > >> because
> > > > > > >>> of
> > > > > > >>>>>> its
> > > > > > >>>>>>>> incompleteness. Age doesn't
determine public API status.
> > The
> > > > > > >>>> community
> > > > > > >>>>>>> does.
> > > > > > >>>>>>>
> > > > > > >>>>>>> People are using it, so we have
to consider the
> > implications
> > > of
> > > > > > >>>> whatever
> > > > > > >>>>>>> changes we make and weigh against
the benefits. I believe
> > the
> > > > > last
> > > > > > >>> bug
> > > > > > >>>>>> fix
> > > > > > >>>>>>> was done this year, so I would
argue it is being
> > maintained.
> > > > > > >> Changes
> > > > > > >>>> to
> > > > > > >>>>>> our
> > > > > > >>>>>>> encryption at rest implementation
will have consequences
> > for
> > > > > those
> > > > > > >>>>>> users.
> > > > > > >>>>>>> There had better be a clear benefit
if we break their
> > > systems.
> > > > > > >>>>>>>
> > > > > > >>>>>>>> 4. Has Accumulo's been evaluated
for security and
> > > performance?
> > > > > > >> By
> > > > > > >>>>>> whom?
> > > > > > >>>>>>> Is
> > > > > > >>>>>>>> it published?
> > > > > > >>>>>>> Yes, there have been several talks
at meetups and
> > conferences
> > > > > that
> > > > > > >>>>>> discuss
> > > > > > >>>>>>> the security and performance of
the current solution.
> > > > > > >>>>>>>
> > > > > > >>>>>>>> On Sun, Nov 1, 2015, 08:55
Adam Fuchs<afuchs@apache.org
> > > > > > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>
> wrote:
> > > > > > >>>>>>>>> There's another way to
look at the state of Accumulo's
> > > > > > >>> encryption
> > > > > > >>>> at
> > > > > > >>>>>>> rest:
> > > > > > >>>>>>>>> 1. Encryption at rest works
great for what it does, and
> > the
> > > > > > >> code
> > > > > > >>>>>> being
> > > > > > >>>>>>> "at
> > > > > > >>>>>>>>> rest" isn't necessarily
a problem
> > > > > > >>>>>>>>> 2. Several organizations
are using Accumulo's
> encryption
> > at
> > > > > > >> rest
> > > > > > >>>>>>>>> effectively in operations
> > > > > > >>>>>>>>> 3. Encryption at rest has
been a supported
> configuration
> > > > > > >> option
> > > > > > >>>> for
> > > > > > >>>>>>> over
> > > > > > >>>>>>>>> two years with established
plugin interfaces, and
> > therefore
> > > > it
> > > > > > >>>>>> should
> > > > > > >>>>>>> be
> > > > > > >>>>>>>>> considered part of the
public API
> > > > > > >>>>>>>>> 4. Upstream alternatives
(to my knowledge) have not
> been
> > > > > > >>> analyzed
> > > > > > >>>>>> for
> > > > > > >>>>>>>>> performance or security
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> The given option #2 would
at least require an analysis
> of
> > > > > > >>>>>> alternatives,
> > > > > > >>>>>>> and
> > > > > > >>>>>>>>> we would have to decide
what to do about backwards
> > > > > > >> compatibility
> > > > > > >>>> for
> > > > > > >>>>>>> users
> > > > > > >>>>>>>>> using custom key stores
and encryption strategies that
> > may
> > > or
> > > > > > >>> may
> > > > > > >>>>>> not
> > > > > > >>>>>>> be
> > > > > > >>>>>>>>> supported by upstream alternatives.
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> As far as option #1 goes,
I can get behind encouraging
> > > people
> > > > > > >> to
> > > > > > >>>>>> take
> > > > > > >>>>>>> up
> > > > > > >>>>>>>>> projects to improve Accumulo's
encryption. I think
> we're
> > > > > > >> already
> > > > > > >>>>>> going
> > > > > > >>>>>>> down
> > > > > > >>>>>>>>> this path, but without
having identified resources to
> do
> > > the
> > > > > > >>>>>>> improvements.
> > > > > > >>>>>>>>> Any volunteers?
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> Adam
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> On Fri, Oct 30, 2015 at
4:22 PM, William Slacum<
> > > > > > >>>> wslacum@gmail.com<javascript:_e(%7B%7D,'cvml','
> > > wslacum@gmail.com
> > > > > ');>>
> > > > > > >>>>>>> wrote:
> > > > > > >>>>>>>>>> So I've been looking
into options for providing
> > encryption
> > > > > > >> at
> > > > > > >>>>>> rest,
> > > > > > >>>>>>> and
> > > > > > >>>>>>>>> it
> > > > > > >>>>>>>>>> seems like what Accumulo
has is abandonware from a
> > project
> > > > > > >>>>>>> perspective.
> > > > > > >>>>>>>>>> There is no official
documentation on how to perform
> > > > > > >>> encryption
> > > > > > >>>> at
> > > > > > >>>>>>> rest,
> > > > > > >>>>>>>>>> and the best information
from its status comes from
> year
> > > (or
> > > > > > >>>>>> greater)
> > > > > > >>>>>>> old
> > > > > > >>>>>>>>>> ticket comments about
how the feature is still
> > > experimental.
> > > > > > >>>>>> Recently
> > > > > > >>>>>>>>> there
> > > > > > >>>>>>>>>> was a talk that described
using HDFS encryption zones
> as
> > > an
> > > > > > >>>>>>> alternative.
> > > > > > >>>>>>>>>>  From my perspective,
this is what I see as the
> current
> > > > > > >>>> situation:
> > > > > > >>>>>>>>>> 1- Encryption at rest
in Accumulo isn't actively being
> > > > > > >> worked
> > > > > > >>> on
> > > > > > >>>>>>>>>> 2- Encryption at rest
in Accumulo isn't part of the
> > public
> > > > > > >> API
> > > > > > >>>> or
> > > > > > >>>>>>>>> marketed
> > > > > > >>>>>>>>>> capabilities
> > > > > > >>>>>>>>>> 3- Documentation for
what does exist is scattered
> > > throughout
> > > > > > >>>> Jira
> > > > > > >>>>>>>>> comments
> > > > > > >>>>>>>>>> or presentations
> > > > > > >>>>>>>>>> 4- A viable alternative
exists that appears to have
> > > feature
> > > > > > >>>>>> parity in
> > > > > > >>>>>>>>> HDFS
> > > > > > >>>>>>>>>> encryption
> > > > > > >>>>>>>>>> 5- HBase has finer
grained encryption capabilities
> that
> > > > > > >> extend
> > > > > > >>>>>> beyond
> > > > > > >>>>>>>>> what
> > > > > > >>>>>>>>>> HDFS provides
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Moving forward, what's
the consensus for supporting
> this
> > > > > > >>>> feature?
> > > > > > >>>>>>>>>> Personally, I see two
options:
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> 1- Start going down
a path to bring the feature into
> the
> > > > > > >>>> forefront
> > > > > > >>>>>>> and
> > > > > > >>>>>>>>>> start providing feature
parity with HBase
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> or
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> 2- Remove the feature
and place emphasis on upstream
> > > > > > >>> encryption
> > > > > > >>>>>>> offerings
> > > > > > >>>>>>>>>> Any input is welcomed&
 appreciated!
> > > > > > >>>>>>>>>>
> > > > > > >>>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message