accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Unexpected aliasing from RFile getTopValue()
Date Wed, 15 Apr 2015 15:52:33 GMT


Keith Turner wrote:
>
>
> On Wed, Apr 15, 2015 at 11:06 AM, Adam Fuchs <afuchs@apache.org
> <mailto:afuchs@apache.org>> wrote:
>
>     On Wed, Apr 15, 2015 at 10:20 AM, Keith Turner <keith@deenlo.com
>     <mailto:keith@deenlo.com>> wrote:
>
>
>         Random thought on revamp.  Immutable key values with enough
>         primitives to make most operations efficient (avoid constant
>         alloc/copy) might be something to consider for the iterator API
>
>     So, is this a tradeoff in the performance vs. inter-iterator
>     isolation space? From a performance perspective we would do best if
>     we just passed around pointers to an underlying byte array (e.g.
>     ByteBuffer-style), but maximum
>
>
> There are performance implications to consider key/vals not being
> immutable.  Currently if any iterator wants to keep a key/val to compare
> it later key vals, then it has to copy it. I think some iterators do
> this frequently.  I am not making the assertion that immutable would
> perform better, I don't know.
>
>     isolation would require never reusing anything returned from an
>     iterator's getTopX methods. From a security perspective we need to
>     be careful with how we reuse data objects (hence the need for the
>     SynchronizedIterator at the top of the "system" iterators), but I
>     would say we can probably relax other isolation concerns in the
>     iterators in favor of performance.
>
>     I think there's probably a bigger project here around minimizing the
>     object creation, data copying, serialization, and deserialization of
>     keys. We did some work that Chris McCubbin will be presenting at the
>     upcoming accumulo summit around pushing key comparisons down to a
>     serialized form of the key, and that made a huge impact on load
>     performance. I think we could probably achieve an order of magnitude
>     more throughput in the iterator tree with a major refactoring. Any
>     thoughts on when we might have the appetite for such a change? If
>     we're thinking about making key/values immutable then we might
>     piggyback a bigger redesign on that already breaking change.
>
>
> If we were to introduce an improved iterator API, i would hope we could
> deprecate and still support the old API.

Strong +1

>
>
>     Adam
>
>

Mime
View raw message