accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Will Murnane (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement
Date Thu, 22 Sep 2016 14:28:21 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513435#comment-15513435
] 

Will Murnane commented on ACCUMULO-4468:
----------------------------------------

[~elserj] I'm not sure why customVanilla and standardEquals behave differently. The difference
is small, and perhaps it's due to variance in the JDK used to compile the standard Accumulo
JAR versus the one used to compile the benchmark code? Maybe there are effects from having
the code loaded from a small JAR versus a large one? Maybe the custom WillKey class gets laid
out in memory differently, and it hits instruction cache differently? This is the problem
of benchmarking...

RE: generation of data, yeah, the current test data... leaves something to be desired. This
was basically the least-worst mechanism I could come up with in 5 minutes to generate some
test data that kinda-sorta resembles our production data. If anyone has a better strategy
I'm willing to do a little legwork testing other data sets.

[~kturner] The parts of the key are stored on the heap somewhere, so the problem of row equality
is somewhat different than the problem of comparing two contiguous byte arrays. That said,
maybe there would be benefits to storing all the pieces of the Key in a single byte array,
and maintaining indices into it to track the individual parts, rather than several smaller
arrays... That's a big refactor, though, for an unknown change in performance.

I think it would be worth revisiting the comparison mechanism in isEqual, too, doing something
like the Unsafe method used in Hadoop's FastByteComparisons class but going in reverse. The
CPU's speculative prefetch should work in either direction, but doing the comparison byte-at-a-time
is going to be more expensive than the 64-bit comparisons that FastByteComparisons does. But
that's a topic for another ticket ;)

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> ----------------------------------------------------------
>
>                 Key: ACCUMULO-4468
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.8.0
>            Reporter: Will Murnane
>            Priority: Trivial
>              Labels: newbie, performance
>         Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares starting at
the beginning of the key, and works its way toward the end. This functions correctly, of course,
but one of the typical uses of this method is to compare adjacent rows to break them into
larger chunks. For example, accumulo.core.iterators.Combiner repeatedly calls this method
with subsequent pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is called
with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, and finally row. This
(marginally) improves the speed of comparisons in the relatively common case where only the
last part is changing, with less complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message