accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement
Date Wed, 21 Sep 2016 18:18:20 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510736#comment-15510736
] 

Christopher Tubbs edited comment on ACCUMULO-4468 at 9/21/16 6:18 PM:
----------------------------------------------------------------------

Hi [~wmurnane]. Thanks for the patch!

After reviewing, I had a few comments:

# This may speed things up only if the probability of not being equal is greater in the lower
dimensions of the key than the higher ones. I'm not sure this is the case, as I don't have
a strong sense of when this method is called, or how frequently. What analysis have you done
to determine these relative probabilities?
# Have you run any experiments to determine the performance differences in various use cases?
# If this method is primarily used in user code, would a new method to compare in reverse
order be better, to account for both cases? Perhaps it'd be better to optimize the Combiner
code, rather than change the default behavior for all cases?
# I think relying on fall-through behavior in switch statements should be avoided. It's prone
to error, especially as code is refactored over time. I think it's better to avoid it than
to suppress the warning. This may be a style choice, but it's a preference that the java compiler
weighs in on (by making fallthrough a default compiler warning), and I'd prefer to avoid behavior
which results in compiler warnings whenever possible.



was (Author: ctubbsii):
Hi [~wmurnane]. Thanks for the patch!

After reviewing, I had a few comments:

1. This may speed things up only if the probability of not being equal is greater in the lower
dimensions of the key than the higher ones. I'm not sure this is the case, as I don't have
a strong sense of when this method is called, or how frequently. What analysis have you done
to determine these relative probabilities?
2. Have you run any experiments to determine the performance differences in various use cases?
3. If this method is primarily used in user code, would a new method to compare in reverse
order be better, to account for both cases? Perhaps it'd be better to optimize the Combiner
code, rather than change the default behavior for all cases?
4. I think relying on fall-through behavior in switch statements should be avoided. It's prone
to error, especially as code is refactored over time. I think it's better to avoid it than
to suppress the warning. This may be a style choice, but it's a preference that the java compiler
weighs in on (by making fallthrough a default compiler warning), and I'd prefer to avoid behavior
which results in compiler warnings whenever possible.


> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> ----------------------------------------------------------
>
>                 Key: ACCUMULO-4468
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.8.0
>            Reporter: Will Murnane
>            Priority: Trivial
>              Labels: newbie, performance
>         Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares starting at
the beginning of the key, and works its way toward the end. This functions correctly, of course,
but one of the typical uses of this method is to compare adjacent rows to break them into
larger chunks. For example, accumulo.core.iterators.Combiner repeatedly calls this method
with subsequent pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is called
with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, and finally row. This
(marginally) improves the speed of comparisons in the relatively common case where only the
last part is changing, with less complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message