accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: [DISCUSS] What to do about encryption at rest?
Date Thu, 05 Nov 2015 18:18:00 GMT
Perhaps. I had interpreted some of Adam's comments ("The only thing that
doesn't get encrypted is a temporary WAL recovery file. That is a project
we should take on..."), as favoring improvements to the current state of
things. As that has also been the focus of previous conversations about the
state of Accumulo's encryption-at-rest, I assumed that third camp also
existed. Perhaps I was wrong.

On Thu, Nov 5, 2015 at 1:11 PM Mike Drob <mdrob@apache.org> wrote:

> I think you have misidentified the two camps. There is a camp that believes
> we should phase out the code in favour of the HDFS encryption, and a camp
> that believes the code is sufficiently mature. I don't think there is a
> group that is interested in improving the state of things.
>
> On Thu, Nov 5, 2015 at 12:02 PM, Christopher <ctubbsii@apache.org> wrote:
>
> > JIRAs are fine, but I thought this thread was mostly addressing the fact
> > that there doesn't seem to be a sustained interest in actually working on
> > any of the JIRAs addressing that area of code. Am I wrong? Is there
> > willingness from anybody to expend effort on this code? Even if not, we
> can
> > still make JIRAs, but they'll probably just be ignored. So, the question
> > for me is: which JIRAs should we make? Are we going to pursue phasing out
> > the code, or pursue improving it? Those are very different JIRA text.
> >
> > On Thu, Nov 5, 2015 at 12:22 PM Mike Drob <mdrob@apache.org> wrote:
> >
> > > Can we file some JIRAs to build out a suite to test this and run the
> > > necessary tests?
> > >
> > > On Thu, Nov 5, 2015 at 11:17 AM, Christopher <ctubbsii@apache.org>
> > wrote:
> > >
> > > > My main concern using HDFS encryption vs. built-in Accumulo
> > > implementation
> > > > is possibly performance with respect to seeks. If we encrypt our
> > indexed
> > > > blocks independently (as we do now), I suspect our seeks would be
> more
> > > > performant than relying on HDFS encryption, whose encrypted blocks
> may
> > > not
> > > > fall on our index boundaries. If this is a small difference, it might
> > > still
> > > > be worth it for convenience and simpler maintenance, but I suspect
> the
> > > > difference will be somewhat substantial.
> > > >
> > > > On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <josh.elser@gmail.com>
> > wrote:
> > > >
> > > > > +1 I think this is the right step. My hunch is that some of the
> > common
> > > > > data access patterns that we have in Accumulo (over HBase) is that
> > the
> > > > > per-colfam encryption isn't quick as common a design pattern as it
> is
> > > > > for HBase (please tell me I'm wrong if anyone disagrees -- this is
> > > > > mostly a gut reaction). I think our users would likely benefit more
> > > from
> > > > > a per-namespace/table encryption control like you suggest.
> > > > >
> > > > > Implementing RFile encryption at HDFS level (e.g. tie a specific
> > > > > zone/key for a table) is probably straightforward. Changing the
> > > > > TServer's WAL use would likely be trickier to get right (a tserver
> > > would
> > > > > have multiple WALs, one for each unique zone/key from Tablet it
> > happens
> > > > > to host). Maybe worrying about that is getting ahead of things --
> > just
> > > > > thought about it and figured I'd mention it :)
> > > > >
> > > > > William Slacum wrote:
> > > > > > Yup, #2. I also don't know if it's worth the effort for that
> > specific
> > > > > > feature. It might be easier to add something like per-namespace
> > > and/or
> > > > > > per-table encryption, then define common access patterns for
> > > > applications
> > > > > > that want to use multiple keys for encryption.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afuchs@apache.org>
> > > wrote:
> > > > > >
> > > > > >> Bill,
> > > > > >>
> > > > > >> Do you envision one of the following as the driver behind
> > > > finer-grained
> > > > > >> encryption?:
> > > > > >>
> > > > > >> 1. We would only encrypt certain columns in order to get
better
> > > > > >> performance;
> > > > > >>
> > > > > >> 2. We would use different keys on different columns in order
to
> > > revoke
> > > > > >> access to a column via the key store;
> > > > > >>
> > > > > >> 3. We would only give a tablet server access to a subset
of
> > columns
> > > at
> > > > > any
> > > > > >> given time in order to protect something, and figure out
what to
> > do
> > > > for
> > > > > >> compactions, etc.;
> > > > > >>
> > > > > >> 4. Something entirely different...
> > > > > >>
> > > > > >> Seems like thing #2 might have merit, but I'm not sure it's
> worth
> > > the
> > > > > >> effort.
> > > > > >>
> > > > > >> Adam
> > > > > >> On Nov 4, 2015 7:38 PM, "William Slacum"<wslacum@gmail.com>
> > wrote:
> > > > > >>
> > > > > >>> @Adam, column family level encryption can be useful
for
> > > multi-tenant
> > > > > >>> environments, and I think it maps pretty well to the
document
> > > > > >>> partitioning/sharding/wikisearch style tables. Things
are
> > trickier
> > > in
> > > > > >>> Accumulo than in HBase since there isn't a 1:1 mapping
between
> > > column
> > > > > >>> families and files. The built in RFile encryption scheme
seems
> > > better
> > > > > >>> suited to this.
> > > > > >>>
> > > > > >>> @Christopher&  Keith, it's something we can evaluate.
Is there
> a
> > > good
> > > > > >> test
> > > > > >>> harness for just writing an RFile, opening a reader
to it, and
> > just
> > > > > >> poking
> > > > > >>> around? I was looking at the constructors and they didn't
seem
> > > > > >>> straightforward enough for me to comprehend them within
a few
> > > > seconds.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<keith@deenlo.com
> > > > > >>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>
 wrote:
> > > > > >>>
> > > > > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<keith@deenlo.com
> > > > > >>>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>
 wrote:
> > > > > >>>>
> > > > > >>>>>
> > > > > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<
> > > wslacum@gmail.com
> > > > > >>>> <javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>>
 wrote:
> > > > > >>>>>> Is "the code being 'at rest'" you making
a funny about
> active
> > > > > >>>> development?
> > > > > >>>>>> Making sure I haven't lost my ability to
get jokes :)
> > > > > >>>>>>
> > > > > >>>>>> I see two reasons why the code would be
inactive: the
> feature
> > is
> > > > > >> good
> > > > > >>>>>> enough as is or it's not interesting enough
to attract
> > > attention.
> > > > > >>>>>> Considering it's not public API, there are
no discussions to
> > > bring
> > > > > >>> into
> > > > > >>>>>> the
> > > > > >>>>>> public API, and there's no effort to document
how to use it,
> > my
> > > > > >>>> intuition
> > > > > >>>>>> tells me that there isn't enough interest
in it from a
> project
> > > > > >>>>>> perspective.
> > > > > >>>>>>
> > > > > >>>>>>  From a user perspective, I've been getting
asked about it
> > when
> > > I
> > > > > >> work
> > > > > >>>> with
> > > > > >>>>>> Accumulo users. My recommendation, exclusively,
is to use
> HDFS
> > > > > >>>> encryption
> > > > > >>>>>> because I can go to Hadoop's website and
find documentation
> on
> > > it.
> > > > > >>> When
> > > > > >>>> I
> > > > > >>>>>> go to find documentation on Accumulo's offerings,
any
> > usability
> > > > > >>>>>> information
> > > > > >>>>>> comes from vendor SlideShares. Most mentions
of the feature
> on
> > > > > >>> official
> > > > > >>>>>> Apache Accumulo channels echo Christopher's
sentiments on
> the
> > > > > >> feature
> > > > > >>>>>> being
> > > > > >>>>>> experimental and not being officially recommended
for use.
> > > > > >>>>>>
> > > > > >>>>>> I wouldn't want to rip out the feature first
and then figure
> > > > things
> > > > > >>> out
> > > > > >>>>>> later. Sean already alluded to it, but a
roadmap should
> > contain
> > > > > >>>> something
> > > > > >>>>>> (tool or documentation) to help users migrate
if we go down
> > that
> > > > > >>> route.
> > > > > >>>>>> What I'm trying to figure out is, when the
question of "How
> > do I
> > > > do
> > > > > >>>>>> encryption at rest in Accumulo?" comes up,
what is our
> > > community's
> > > > > >>>> answer?
> > > > > >>>>>> If we went down the route of using HDFS
encryption zones,
> can
> > we
> > > > > >> offer
> > > > > >>>> the
> > > > > >>>>>> same features? At the very least, we'd be
offering the same
> > > > > >>>> database-level
> > > > > >>>>> Where does the decryption happen with DFS, is
it in the DFS
> > > client?
> > > > > >> If
> > > > > >>>>> so, using HDFS level encryption seems to offer
the same
> > > > > >>> functionality???
> > > > > >>>>> Has anyone written a tool that takes an
> > > > > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and
rewrites it is
> as
> > > an
> > > > > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile? 
Wondering if
> there
> > > are
> > > > > >> any
> > > > > >>>>> unexpected gotchas w/ this.
> > > > > >>>>>
> > > > > >>>> I was discussing my questions w/ Christopher today
and he
> > > mentioned
> > > > an
> > > > > >>>> experiment that I thought was interesting.   What
is the
> random
> > > seek
> > > > > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile
vs
> > > > > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>> encryption scheme. I don't know the details
of "more
> advanced
> > > key
> > > > > >>>> stores",
> > > > > >>>>>> but it seems like we could potentially take
any custom
> > > > > >> implementation
> > > > > >>>> and
> > > > > >>>>>> map it to a KeyProvider [1]. I could also
envision table
> level
> > > > > >>>> encryption
> > > > > >>>>>> being implementable via zones, but probably
not down to the
> > > column
> > > > > >>>> family
> > > > > >>>>>> level.
> > > > > >>>>>>
> > > > > >>>>>> [1]
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
> > > > > >>>>>>
> > > > > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<
> afuchs@apache.org
> > > > > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>
 wrote:
> > > > > >>>>>>> Responses inline.
> > > > > >>>>>>>
> > > > > >>>>>>> Adam
> > > > > >>>>>>>
> > > > > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<ctubbsii@apache.org
> > > > > >>>> <javascript:_e(%7B%7D,'cvml','ctubbsii@apache.org');>>
> wrote:
> > > > > >>>>>>>> 1. I'm not sure I'd call an incomplete
solution 'great'.
> > What
> > > it
> > > > > >>>> does
> > > > > >>>>>> is
> > > > > >>>>>>>> provide partial encryption-at-rest
protection (unless
> you're
> > > > > >>> running
> > > > > >>>>>>>> without walogs, and have good integration
with some
> external
> > > > > >>> secure
> > > > > >>>>>> key
> > > > > >>>>>>>> management faculty, and then it's
probably fine).
> > > > > >>>>>>> The only thing that doesn't get encrypted
is a temporary
> WAL
> > > > > >>> recovery
> > > > > >>>>>> file.
> > > > > >>>>>>> That is a project we should take on,
but it does not imply
> > that
> > > > > >> the
> > > > > >>>>>>> existing features are not valuable.
With HDFS encryption
> > > options
> > > > > >>> this
> > > > > >>>>>> would
> > > > > >>>>>>> now be a much easier project to take
on. Also, the users I
> > know
> > > > > >> that
> > > > > >>>> use
> > > > > >>>>>>> encryption at rest do so with a more
secure key store than
> > the
> > > > > >>>> default.
> > > > > >>>>>>>> 2. I'm concerned that anybody using
Accumulo's E-A-R don't
> > > > > >>>> necessarily
> > > > > >>>>>>>> realize its current shortcomings,
or its lack of upstream
> > > > > >>>> maintenance
> > > > > >>>>>>>> support (which it has not been receiving).
It may be the
> > case
> > > > > >> that
> > > > > >>>>>> these
> > > > > >>>>>>>> users have support from an intermediary,
and do understand
> > the
> > > > > >>>>>>>> shortcomings... I don't know, but
it's a concern.
> > > > > >>>>>>> Anybody that creates a secure system
has to analyze the
> > > security
> > > > > >> of
> > > > > >>>> the
> > > > > >>>>>>> system as a whole. Accumulo's encryption
at rest is one
> part
> > of
> > > > > >> the
> > > > > >>>>>>> solution. Taking away the tool without
providing an
> > alternative
> > > > > >> does
> > > > > >>>>>>> nothing to improve the security of systems
built on
> Accumulo.
> > > > > >>>>>>>
> > > > > >>>>>>>> 3. Correction: it has been an explicitly
experimental
> > feature
> > > > > >> and
> > > > > >>> an
> > > > > >>>>>>>> incomplete one, which hasn't really
been touched in two
> > years,
> > > > > >> and
> > > > > >>>> has
> > > > > >>>>>>> been
> > > > > >>>>>>>> explicitly excluded by the community
for being public API
> > > > > >> because
> > > > > >>> of
> > > > > >>>>>> its
> > > > > >>>>>>>> incompleteness. Age doesn't determine
public API status.
> The
> > > > > >>>> community
> > > > > >>>>>>> does.
> > > > > >>>>>>>
> > > > > >>>>>>> People are using it, so we have to consider
the
> implications
> > of
> > > > > >>>> whatever
> > > > > >>>>>>> changes we make and weigh against the
benefits. I believe
> the
> > > > last
> > > > > >>> bug
> > > > > >>>>>> fix
> > > > > >>>>>>> was done this year, so I would argue
it is being
> maintained.
> > > > > >> Changes
> > > > > >>>> to
> > > > > >>>>>> our
> > > > > >>>>>>> encryption at rest implementation will
have consequences
> for
> > > > those
> > > > > >>>>>> users.
> > > > > >>>>>>> There had better be a clear benefit
if we break their
> > systems.
> > > > > >>>>>>>
> > > > > >>>>>>>> 4. Has Accumulo's been evaluated
for security and
> > performance?
> > > > > >> By
> > > > > >>>>>> whom?
> > > > > >>>>>>> Is
> > > > > >>>>>>>> it published?
> > > > > >>>>>>> Yes, there have been several talks at
meetups and
> conferences
> > > > that
> > > > > >>>>>> discuss
> > > > > >>>>>>> the security and performance of the
current solution.
> > > > > >>>>>>>
> > > > > >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam
Fuchs<afuchs@apache.org
> > > > > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>>
 wrote:
> > > > > >>>>>>>>> There's another way to look
at the state of Accumulo's
> > > > > >>> encryption
> > > > > >>>> at
> > > > > >>>>>>> rest:
> > > > > >>>>>>>>> 1. Encryption at rest works
great for what it does, and
> the
> > > > > >> code
> > > > > >>>>>> being
> > > > > >>>>>>> "at
> > > > > >>>>>>>>> rest" isn't necessarily a problem
> > > > > >>>>>>>>> 2. Several organizations are
using Accumulo's encryption
> at
> > > > > >> rest
> > > > > >>>>>>>>> effectively in operations
> > > > > >>>>>>>>> 3. Encryption at rest has been
a supported configuration
> > > > > >> option
> > > > > >>>> for
> > > > > >>>>>>> over
> > > > > >>>>>>>>> two years with established plugin
interfaces, and
> therefore
> > > it
> > > > > >>>>>> should
> > > > > >>>>>>> be
> > > > > >>>>>>>>> considered part of the public
API
> > > > > >>>>>>>>> 4. Upstream alternatives (to
my knowledge) have not been
> > > > > >>> analyzed
> > > > > >>>>>> for
> > > > > >>>>>>>>> performance or security
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> The given option #2 would at
least require an analysis of
> > > > > >>>>>> alternatives,
> > > > > >>>>>>> and
> > > > > >>>>>>>>> we would have to decide what
to do about backwards
> > > > > >> compatibility
> > > > > >>>> for
> > > > > >>>>>>> users
> > > > > >>>>>>>>> using custom key stores and
encryption strategies that
> may
> > or
> > > > > >>> may
> > > > > >>>>>> not
> > > > > >>>>>>> be
> > > > > >>>>>>>>> supported by upstream alternatives.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> As far as option #1 goes, I
can get behind encouraging
> > people
> > > > > >> to
> > > > > >>>>>> take
> > > > > >>>>>>> up
> > > > > >>>>>>>>> projects to improve Accumulo's
encryption. I think we're
> > > > > >> already
> > > > > >>>>>> going
> > > > > >>>>>>> down
> > > > > >>>>>>>>> this path, but without having
identified resources to do
> > the
> > > > > >>>>>>> improvements.
> > > > > >>>>>>>>> Any volunteers?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Adam
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Fri, Oct 30, 2015 at 4:22
PM, William Slacum<
> > > > > >>>> wslacum@gmail.com<javascript:_e(%7B%7D,'cvml','
> > wslacum@gmail.com
> > > > ');>>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>>>> So I've been looking into
options for providing
> encryption
> > > > > >> at
> > > > > >>>>>> rest,
> > > > > >>>>>>> and
> > > > > >>>>>>>>> it
> > > > > >>>>>>>>>> seems like what Accumulo
has is abandonware from a
> project
> > > > > >>>>>>> perspective.
> > > > > >>>>>>>>>> There is no official documentation
on how to perform
> > > > > >>> encryption
> > > > > >>>> at
> > > > > >>>>>>> rest,
> > > > > >>>>>>>>>> and the best information
from its status comes from year
> > (or
> > > > > >>>>>> greater)
> > > > > >>>>>>> old
> > > > > >>>>>>>>>> ticket comments about how
the feature is still
> > experimental.
> > > > > >>>>>> Recently
> > > > > >>>>>>>>> there
> > > > > >>>>>>>>>> was a talk that described
using HDFS encryption zones as
> > an
> > > > > >>>>>>> alternative.
> > > > > >>>>>>>>>>  From my perspective, this
is what I see as the current
> > > > > >>>> situation:
> > > > > >>>>>>>>>> 1- Encryption at rest in
Accumulo isn't actively being
> > > > > >> worked
> > > > > >>> on
> > > > > >>>>>>>>>> 2- Encryption at rest in
Accumulo isn't part of the
> public
> > > > > >> API
> > > > > >>>> or
> > > > > >>>>>>>>> marketed
> > > > > >>>>>>>>>> capabilities
> > > > > >>>>>>>>>> 3- Documentation for what
does exist is scattered
> > throughout
> > > > > >>>> Jira
> > > > > >>>>>>>>> comments
> > > > > >>>>>>>>>> or presentations
> > > > > >>>>>>>>>> 4- A viable alternative
exists that appears to have
> > feature
> > > > > >>>>>> parity in
> > > > > >>>>>>>>> HDFS
> > > > > >>>>>>>>>> encryption
> > > > > >>>>>>>>>> 5- HBase has finer grained
encryption capabilities that
> > > > > >> extend
> > > > > >>>>>> beyond
> > > > > >>>>>>>>> what
> > > > > >>>>>>>>>> HDFS provides
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Moving forward, what's the
consensus for supporting this
> > > > > >>>> feature?
> > > > > >>>>>>>>>> Personally, I see two options:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> 1- Start going down a path
to bring the feature into the
> > > > > >>>> forefront
> > > > > >>>>>>> and
> > > > > >>>>>>>>>> start providing feature
parity with HBase
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> or
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> 2- Remove the feature and
place emphasis on upstream
> > > > > >>> encryption
> > > > > >>>>>>> offerings
> > > > > >>>>>>>>>> Any input is welcomed&
 appreciated!
> > > > > >>>>>>>>>>
> > > > > >>>>>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message