accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4468), PartialKey) improvement
Date Wed, 21 Sep 2016 18:20:20 GMT


Josh Elser commented on ACCUMULO-4468:

bq.    Do you have other examples of where this might be used in a tight loop?

I think there are lots of other examples in Accumulo itself; for example, in iterators.system.DeletingIterator
and iterators.user.TransformingIterator, this method is called in a loop. In our use case,
we're doing a similar thing: building a larger object out of multiple rows, by finding groups
of rows which are equal under ROW_COLFAM. When each object is built from only a few rows,
the CF equality comparison returns false pretty often (which is to be expected), but only
after comparing row IDs, which are always the same in practice.

Perhaps my concern didn't come across. From my perspective: I am concerned with a performance
change that makes one case better. We need to understand if the case you outlined is "the
norm" or "the exception". I was hoping you had context on this.

bq.  How did you test this? What types of numbers did you see?

I haven't been able to install it on a cluster to test. The test suite does pass with this
patch applied. I think it's a minor change; in the "rows are equal" case the same amount of
work is done as with the existing code, although the parts are accessed in the opposite order.
They're still compared mostly-in-order, as isEqual does, but the comment in that function
was inspiration to try reversing the comparison order.

Aside from performance, the code seems cleaner to me: there's no more repetition of e.g. the
check of row equality. The bytecode (with Oracle javac 1.8.0_92) is substantially smaller:
389 bytes versus 167.

At risk of being anti-social, I am -1 on any change for performance without numbers coming
with it. There are many great tools out there to benchmark changes and don't necessarily require
use of a cluster (it might actually be harder to test on a cluster). [JMH|]
and [Google Caliper|] are two good tools for micro-benchmarking.

>, PartialKey) improvement
> ----------------------------------------------------------
>                 Key: ACCUMULO-4468
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.8.0
>            Reporter: Will Murnane
>            Priority: Trivial
>              Labels: newbie, performance
>         Attachments: key_comparison.patch
> In the Key.equals(Key, PartialKey) overload, the current method compares starting at
the beginning of the key, and works its way toward the end. This functions correctly, of course,
but one of the typical uses of this method is to compare adjacent rows to break them into
larger chunks. For example, accumulo.core.iterators.Combiner repeatedly calls this method
with subsequent pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is called
with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, and finally row. This
(marginally) improves the speed of comparisons in the relatively common case where only the
last part is changing, with less complex code.

This message was sent by Atlassian JIRA

View raw message