accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: [DISCUSS] What to do about encryption at rest?
Date Thu, 05 Nov 2015 18:21:49 GMT
On Thu, Nov 5, 2015 at 12:17 PM, Christopher <ctubbsii@apache.org> wrote:

> My main concern using HDFS encryption vs. built-in Accumulo implementation
> is possibly performance with respect to seeks. If we encrypt our indexed
> blocks independently (as we do now), I suspect our seeks would be more
> performant than relying on HDFS encryption, whose encrypted blocks may not
> fall on our index boundaries. If this is a small difference, it might still
> be worth it for convenience and simpler maintenance, but I suspect the
> difference will be somewhat substantial.
>

Very good point, Chris. This is especially important if we allow users to
pick their own encryption algorithms. As I understand it, cipher block
chaining (CBC) is important to keep most crypto algorithms secure, and it
has a big effect on where you need to start decrypting. There are ways of
doing CBC that let you seek pretty close to any point in a file and decrypt
from there, and there are other ways that require you to start from the
beginning. The current RFile implementation ensures that you can start
decrypting at the beginning of an RFile block, which matches where we start
decompressing and where we currently seek in HDFS. The performance
difference is likely to be much more pronounced for certain crypto settings.

Does anybody have a good diagram showing the architecture of HDFS
encryption?



>
> On Thu, Nov 5, 2015 at 12:11 PM Josh Elser <josh.elser@gmail.com> wrote:
>
> > +1 I think this is the right step. My hunch is that some of the common
> > data access patterns that we have in Accumulo (over HBase) is that the
> > per-colfam encryption isn't quick as common a design pattern as it is
> > for HBase (please tell me I'm wrong if anyone disagrees -- this is
> > mostly a gut reaction). I think our users would likely benefit more from
> > a per-namespace/table encryption control like you suggest.
> >
> > Implementing RFile encryption at HDFS level (e.g. tie a specific
> > zone/key for a table) is probably straightforward. Changing the
> > TServer's WAL use would likely be trickier to get right (a tserver would
> > have multiple WALs, one for each unique zone/key from Tablet it happens
> > to host). Maybe worrying about that is getting ahead of things -- just
> > thought about it and figured I'd mention it :)
> >
> > William Slacum wrote:
> > > Yup, #2. I also don't know if it's worth the effort for that specific
> > > feature. It might be easier to add something like per-namespace and/or
> > > per-table encryption, then define common access patterns for
> applications
> > > that want to use multiple keys for encryption.
> > >
> > >
> > >
> > > On Wed, Nov 4, 2015 at 8:10 PM, Adam Fuchs<afuchs@apache.org>  wrote:
> > >
> > >> Bill,
> > >>
> > >> Do you envision one of the following as the driver behind
> finer-grained
> > >> encryption?:
> > >>
> > >> 1. We would only encrypt certain columns in order to get better
> > >> performance;
> > >>
> > >> 2. We would use different keys on different columns in order to revoke
> > >> access to a column via the key store;
> > >>
> > >> 3. We would only give a tablet server access to a subset of columns at
> > any
> > >> given time in order to protect something, and figure out what to do
> for
> > >> compactions, etc.;
> > >>
> > >> 4. Something entirely different...
> > >>
> > >> Seems like thing #2 might have merit, but I'm not sure it's worth the
> > >> effort.
> > >>
> > >> Adam
> > >> On Nov 4, 2015 7:38 PM, "William Slacum"<wslacum@gmail.com>  wrote:
> > >>
> > >>> @Adam, column family level encryption can be useful for multi-tenant
> > >>> environments, and I think it maps pretty well to the document
> > >>> partitioning/sharding/wikisearch style tables. Things are trickier
in
> > >>> Accumulo than in HBase since there isn't a 1:1 mapping between column
> > >>> families and files. The built in RFile encryption scheme seems better
> > >>> suited to this.
> > >>>
> > >>> @Christopher&  Keith, it's something we can evaluate. Is there
a good
> > >> test
> > >>> harness for just writing an RFile, opening a reader to it, and just
> > >> poking
> > >>> around? I was looking at the constructors and they didn't seem
> > >>> straightforward enough for me to comprehend them within a few
> seconds.
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Nov 3, 2015 at 9:56 PM, Keith Turner<keith@deenlo.com
> > >>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>  wrote:
> > >>>
> > >>>> On Mon, Nov 2, 2015 at 1:37 PM, Keith Turner<keith@deenlo.com
> > >>>> <javascript:_e(%7B%7D,'cvml','keith@deenlo.com');>>  wrote:
> > >>>>
> > >>>>>
> > >>>>> On Mon, Nov 2, 2015 at 12:27 PM, William Slacum<wslacum@gmail.com
> > >>>> <javascript:_e(%7B%7D,'cvml','wslacum@gmail.com');>> 
wrote:
> > >>>>>> Is "the code being 'at rest'" you making a funny about
active
> > >>>> development?
> > >>>>>> Making sure I haven't lost my ability to get jokes :)
> > >>>>>>
> > >>>>>> I see two reasons why the code would be inactive: the feature
is
> > >> good
> > >>>>>> enough as is or it's not interesting enough to attract
attention.
> > >>>>>> Considering it's not public API, there are no discussions
to bring
> > >>> into
> > >>>>>> the
> > >>>>>> public API, and there's no effort to document how to use
it, my
> > >>>> intuition
> > >>>>>> tells me that there isn't enough interest in it from a
project
> > >>>>>> perspective.
> > >>>>>>
> > >>>>>>  From a user perspective, I've been getting asked about
it when I
> > >> work
> > >>>> with
> > >>>>>> Accumulo users. My recommendation, exclusively, is to use
HDFS
> > >>>> encryption
> > >>>>>> because I can go to Hadoop's website and find documentation
on it.
> > >>> When
> > >>>> I
> > >>>>>> go to find documentation on Accumulo's offerings, any usability
> > >>>>>> information
> > >>>>>> comes from vendor SlideShares. Most mentions of the feature
on
> > >>> official
> > >>>>>> Apache Accumulo channels echo Christopher's sentiments
on the
> > >> feature
> > >>>>>> being
> > >>>>>> experimental and not being officially recommended for use.
> > >>>>>>
> > >>>>>> I wouldn't want to rip out the feature first and then figure
> things
> > >>> out
> > >>>>>> later. Sean already alluded to it, but a roadmap should
contain
> > >>>> something
> > >>>>>> (tool or documentation) to help users migrate if we go
down that
> > >>> route.
> > >>>>>> What I'm trying to figure out is, when the question of
"How do I
> do
> > >>>>>> encryption at rest in Accumulo?" comes up, what is our
community's
> > >>>> answer?
> > >>>>>> If we went down the route of using HDFS encryption zones,
can we
> > >> offer
> > >>>> the
> > >>>>>> same features? At the very least, we'd be offering the
same
> > >>>> database-level
> > >>>>> Where does the decryption happen with DFS, is it in the DFS
client?
> > >> If
> > >>>>> so, using HDFS level encryption seems to offer the same
> > >>> functionality???
> > >>>>> Has anyone written a tool that takes an
> > >>>>> Accumulo-encrypted-HDFS-unencrypted-RFile and rewrites it is
as an
> > >>>>> Accumulo-unencrypted-HDFS-encrypted-RFile?  Wondering if there
are
> > >> any
> > >>>>> unexpected gotchas w/ this.
> > >>>>>
> > >>>> I was discussing my questions w/ Christopher today and he mentioned
> an
> > >>>> experiment that I thought was interesting.   What is the random
seek
> > >>>> performance of Accumulo-encrypted-HDFS-unencrypted-RFile vs
> > >>>> Accumulo-unencrypted-HDFS-encrypted-RFile?
> > >>>>
> > >>>>
> > >>>>>
> > >>>>>
> > >>>>>> encryption scheme. I don't know the details of "more advanced
key
> > >>>> stores",
> > >>>>>> but it seems like we could potentially take any custom
> > >> implementation
> > >>>> and
> > >>>>>> map it to a KeyProvider [1]. I could also envision table
level
> > >>>> encryption
> > >>>>>> being implementable via zones, but probably not down to
the column
> > >>>> family
> > >>>>>> level.
> > >>>>>>
> > >>>>>> [1]
> > >>>>>>
> > >>>>>>
> > >>
> >
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/crypto/key/KeyProvider.html
> > >>>>>>
> > >>>>>> On Sun, Nov 1, 2015 at 10:19 AM, Adam Fuchs<afuchs@apache.org
> > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>> 
wrote:
> > >>>>>>> Responses inline.
> > >>>>>>>
> > >>>>>>> Adam
> > >>>>>>>
> > >>>>>>> On Nov 1, 2015 9:58 AM, "Christopher"<ctubbsii@apache.org
> > >>>> <javascript:_e(%7B%7D,'cvml','ctubbsii@apache.org');>>
 wrote:
> > >>>>>>>> 1. I'm not sure I'd call an incomplete solution
'great'. What it
> > >>>> does
> > >>>>>> is
> > >>>>>>>> provide partial encryption-at-rest protection (unless
you're
> > >>> running
> > >>>>>>>> without walogs, and have good integration with
some external
> > >>> secure
> > >>>>>> key
> > >>>>>>>> management faculty, and then it's probably fine).
> > >>>>>>> The only thing that doesn't get encrypted is a temporary
WAL
> > >>> recovery
> > >>>>>> file.
> > >>>>>>> That is a project we should take on, but it does not
imply that
> > >> the
> > >>>>>>> existing features are not valuable. With HDFS encryption
options
> > >>> this
> > >>>>>> would
> > >>>>>>> now be a much easier project to take on. Also, the
users I know
> > >> that
> > >>>> use
> > >>>>>>> encryption at rest do so with a more secure key store
than the
> > >>>> default.
> > >>>>>>>> 2. I'm concerned that anybody using Accumulo's
E-A-R don't
> > >>>> necessarily
> > >>>>>>>> realize its current shortcomings, or its lack of
upstream
> > >>>> maintenance
> > >>>>>>>> support (which it has not been receiving). It may
be the case
> > >> that
> > >>>>>> these
> > >>>>>>>> users have support from an intermediary, and do
understand the
> > >>>>>>>> shortcomings... I don't know, but it's a concern.
> > >>>>>>> Anybody that creates a secure system has to analyze
the security
> > >> of
> > >>>> the
> > >>>>>>> system as a whole. Accumulo's encryption at rest is
one part of
> > >> the
> > >>>>>>> solution. Taking away the tool without providing an
alternative
> > >> does
> > >>>>>>> nothing to improve the security of systems built on
Accumulo.
> > >>>>>>>
> > >>>>>>>> 3. Correction: it has been an explicitly experimental
feature
> > >> and
> > >>> an
> > >>>>>>>> incomplete one, which hasn't really been touched
in two years,
> > >> and
> > >>>> has
> > >>>>>>> been
> > >>>>>>>> explicitly excluded by the community for being
public API
> > >> because
> > >>> of
> > >>>>>> its
> > >>>>>>>> incompleteness. Age doesn't determine public API
status. The
> > >>>> community
> > >>>>>>> does.
> > >>>>>>>
> > >>>>>>> People are using it, so we have to consider the implications
of
> > >>>> whatever
> > >>>>>>> changes we make and weigh against the benefits. I believe
the
> last
> > >>> bug
> > >>>>>> fix
> > >>>>>>> was done this year, so I would argue it is being maintained.
> > >> Changes
> > >>>> to
> > >>>>>> our
> > >>>>>>> encryption at rest implementation will have consequences
for
> those
> > >>>>>> users.
> > >>>>>>> There had better be a clear benefit if we break their
systems.
> > >>>>>>>
> > >>>>>>>> 4. Has Accumulo's been evaluated for security and
performance?
> > >> By
> > >>>>>> whom?
> > >>>>>>> Is
> > >>>>>>>> it published?
> > >>>>>>> Yes, there have been several talks at meetups and conferences
> that
> > >>>>>> discuss
> > >>>>>>> the security and performance of the current solution.
> > >>>>>>>
> > >>>>>>>> On Sun, Nov 1, 2015, 08:55 Adam Fuchs<afuchs@apache.org
> > >>>> <javascript:_e(%7B%7D,'cvml','afuchs@apache.org');>> 
wrote:
> > >>>>>>>>> There's another way to look at the state of
Accumulo's
> > >>> encryption
> > >>>> at
> > >>>>>>> rest:
> > >>>>>>>>> 1. Encryption at rest works great for what
it does, and the
> > >> code
> > >>>>>> being
> > >>>>>>> "at
> > >>>>>>>>> rest" isn't necessarily a problem
> > >>>>>>>>> 2. Several organizations are using Accumulo's
encryption at
> > >> rest
> > >>>>>>>>> effectively in operations
> > >>>>>>>>> 3. Encryption at rest has been a supported
configuration
> > >> option
> > >>>> for
> > >>>>>>> over
> > >>>>>>>>> two years with established plugin interfaces,
and therefore it
> > >>>>>> should
> > >>>>>>> be
> > >>>>>>>>> considered part of the public API
> > >>>>>>>>> 4. Upstream alternatives (to my knowledge)
have not been
> > >>> analyzed
> > >>>>>> for
> > >>>>>>>>> performance or security
> > >>>>>>>>>
> > >>>>>>>>> The given option #2 would at least require
an analysis of
> > >>>>>> alternatives,
> > >>>>>>> and
> > >>>>>>>>> we would have to decide what to do about backwards
> > >> compatibility
> > >>>> for
> > >>>>>>> users
> > >>>>>>>>> using custom key stores and encryption strategies
that may or
> > >>> may
> > >>>>>> not
> > >>>>>>> be
> > >>>>>>>>> supported by upstream alternatives.
> > >>>>>>>>>
> > >>>>>>>>> As far as option #1 goes, I can get behind
encouraging people
> > >> to
> > >>>>>> take
> > >>>>>>> up
> > >>>>>>>>> projects to improve Accumulo's encryption.
I think we're
> > >> already
> > >>>>>> going
> > >>>>>>> down
> > >>>>>>>>> this path, but without having identified resources
to do the
> > >>>>>>> improvements.
> > >>>>>>>>> Any volunteers?
> > >>>>>>>>>
> > >>>>>>>>> Adam
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Oct 30, 2015 at 4:22 PM, William Slacum<
> > >>>> wslacum@gmail.com<javascript:_e(%7B%7D,'cvml','wslacum@gmail.com
> ');>>
> > >>>>>>> wrote:
> > >>>>>>>>>> So I've been looking into options for providing
encryption
> > >> at
> > >>>>>> rest,
> > >>>>>>> and
> > >>>>>>>>> it
> > >>>>>>>>>> seems like what Accumulo has is abandonware
from a project
> > >>>>>>> perspective.
> > >>>>>>>>>> There is no official documentation on how
to perform
> > >>> encryption
> > >>>> at
> > >>>>>>> rest,
> > >>>>>>>>>> and the best information from its status
comes from year (or
> > >>>>>> greater)
> > >>>>>>> old
> > >>>>>>>>>> ticket comments about how the feature is
still experimental.
> > >>>>>> Recently
> > >>>>>>>>> there
> > >>>>>>>>>> was a talk that described using HDFS encryption
zones as an
> > >>>>>>> alternative.
> > >>>>>>>>>>  From my perspective, this is what I see
as the current
> > >>>> situation:
> > >>>>>>>>>> 1- Encryption at rest in Accumulo isn't
actively being
> > >> worked
> > >>> on
> > >>>>>>>>>> 2- Encryption at rest in Accumulo isn't
part of the public
> > >> API
> > >>>> or
> > >>>>>>>>> marketed
> > >>>>>>>>>> capabilities
> > >>>>>>>>>> 3- Documentation for what does exist is
scattered throughout
> > >>>> Jira
> > >>>>>>>>> comments
> > >>>>>>>>>> or presentations
> > >>>>>>>>>> 4- A viable alternative exists that appears
to have feature
> > >>>>>> parity in
> > >>>>>>>>> HDFS
> > >>>>>>>>>> encryption
> > >>>>>>>>>> 5- HBase has finer grained encryption capabilities
that
> > >> extend
> > >>>>>> beyond
> > >>>>>>>>> what
> > >>>>>>>>>> HDFS provides
> > >>>>>>>>>>
> > >>>>>>>>>> Moving forward, what's the consensus for
supporting this
> > >>>> feature?
> > >>>>>>>>>> Personally, I see two options:
> > >>>>>>>>>>
> > >>>>>>>>>> 1- Start going down a path to bring the
feature into the
> > >>>> forefront
> > >>>>>>> and
> > >>>>>>>>>> start providing feature parity with HBase
> > >>>>>>>>>>
> > >>>>>>>>>> or
> > >>>>>>>>>>
> > >>>>>>>>>> 2- Remove the feature and place emphasis
on upstream
> > >>> encryption
> > >>>>>>> offerings
> > >>>>>>>>>> Any input is welcomed&  appreciated!
> > >>>>>>>>>>
> > >>>>>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message