accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement
Date Wed, 21 Sep 2016 22:28:20 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511398#comment-15511398
] 

Josh Elser commented on ACCUMULO-4468:
--------------------------------------

Great stuff, [~wmurnane]! A few thoughts:

bq. the original equals() copied to the new class is called customVanilla, and the original
equals() in the original Key class is called standardEquals. The numbers to compare are really
the two custom* ones; the standardEquals value is given just to show that it's in the same
ballpark. 

I'm not grok'ing the difference between customVanilla and standardEquals. Why the variance
in these two? Shouldn't they be essentially equivalent?

bq. As a user of the API, I'd rather not have to think about equalsForward() versus equalsBackward().

I concur with you here.

bq. I agree that any change should start with prejudice against it. However, I think the numbers
above prove my case: when keys are presented in sorted order, which happens often in Accumulo,
the proposed method of comparing is slightly but noticeably faster. The degree of improvement
depends on the data, but it doesn't perform worse than the current solution in any case that
I tested.

Great, I hoped you didn't take my initial prejudice badly :). This is a great start. I'm curious
trying to tweak a couple of other things. Sharing your project was super useful.

Ultimately, if these numbers are as they appear (better in some cases, no worse in others),
this is a great improvement. Expecting large contiguous blocks of keys where row or row+cf
change very infrequently makes sense to optimize. It appears that {{compareTo(Object}} is
also using a separate code path, so I don't think this would have a big affect on things like
creating RFiles for bulk imports. I need to search through usages though.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> ----------------------------------------------------------
>
>                 Key: ACCUMULO-4468
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.8.0
>            Reporter: Will Murnane
>            Priority: Trivial
>              Labels: newbie, performance
>         Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares starting at
the beginning of the key, and works its way toward the end. This functions correctly, of course,
but one of the typical uses of this method is to compare adjacent rows to break them into
larger chunks. For example, accumulo.core.iterators.Combiner repeatedly calls this method
with subsequent pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is called
with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, and finally row. This
(marginally) improves the speed of comparisons in the relatively common case where only the
last part is changing, with less complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message