accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: Unexpected aliasing from RFile getTopValue()
Date Wed, 15 Apr 2015 15:06:18 GMT
On Wed, Apr 15, 2015 at 10:20 AM, Keith Turner <keith@deenlo.com> wrote:
>
>
> Random thought on revamp.  Immutable key values with enough primitives to
> make most operations efficient (avoid constant alloc/copy) might be
> something to consider for the iterator API
>
>
So, is this a tradeoff in the performance vs. inter-iterator isolation
space? From a performance perspective we would do best if we just passed
around pointers to an underlying byte array (e.g. ByteBuffer-style), but
maximum isolation would require never reusing anything returned from an
iterator's getTopX methods. From a security perspective we need to be
careful with how we reuse data objects (hence the need for the
SynchronizedIterator at the top of the "system" iterators), but I would say
we can probably relax other isolation concerns in the iterators in favor of
performance.

I think there's probably a bigger project here around minimizing the object
creation, data copying, serialization, and deserialization of keys. We did
some work that Chris McCubbin will be presenting at the upcoming accumulo
summit around pushing key comparisons down to a serialized form of the key,
and that made a huge impact on load performance. I think we could probably
achieve an order of magnitude more throughput in the iterator tree with a
major refactoring. Any thoughts on when we might have the appetite for such
a change? If we're thinking about making key/values immutable then we might
piggyback a bigger redesign on that already breaking change.

Adam

Mime
View raw message