accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement
Date Thu, 22 Sep 2016 03:06:21 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511949#comment-15511949
] 

Josh Elser commented on ACCUMULO-4468:
--------------------------------------

Reran Will's original test and can vouch for the results. I also modified the benchmark to
do a full equality check (Row to Deletion flag) with the same data generation:

{noformat}
Benchmark                   Mode  Cnt   Score   Error  Units
MyBenchmark.customVanilla  thrpt  200  81.514 ± 3.989  ops/s
MyBenchmark.customWill     thrpt  200  96.185 ± 1.736  ops/s
{noformat}

I'm not super happy with the data generation actually being representative, but I am warming
up to these changes having a positive net effect.

I commonly think of the following representation. For each row (which would be relatively
close to each other):

* a few column families
* 10 to 15 qualifiers spread across the families
* A few timestamps spread across the keys in one row

This models attributes on some "object" which is stored in one row. There is some logical
partitioning of the attributes. Most attributes are written once, some are updated over time.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> ----------------------------------------------------------
>
>                 Key: ACCUMULO-4468
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.8.0
>            Reporter: Will Murnane
>            Priority: Trivial
>              Labels: newbie, performance
>         Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares starting at
the beginning of the key, and works its way toward the end. This functions correctly, of course,
but one of the typical uses of this method is to compare adjacent rows to break them into
larger chunks. For example, accumulo.core.iterators.Combiner repeatedly calls this method
with subsequent pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is called
with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, and finally row. This
(marginally) improves the speed of comparisons in the relatively common case where only the
last part is changing, with less complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message