accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4066) Conditional mutation processing performance could be improved.
Date Thu, 28 Jan 2016 05:43:39 GMT


Josh Elser commented on ACCUMULO-4066:

This one slipped in under my radar. Thought I'd give your changes a glance. 3x speed up is

-    for (Condition cond : cm.getConditions()) {
+    // sort conditions inorder to get better lookup performance. Sort on client side so tserver
does not have to do it.
+    Condition[] ca = cm.getConditions().toArray(new Condition[cm.getConditions().size()]);
+    Arrays.sort(ca, CONDITION_COMPARATOR);

To confirm, the server doesnt' rely on the sorted order, just hopes for it for performance

I see a lot of changes in IteratorUtil (I assume to your point about loading iterators from
the table config). How did this used to work? You had lots of new tests added for the other
cases -- do we have good coverage for IteratorUtil already?

> Conditional mutation processing performance could be improved.
> --------------------------------------------------------------
>                 Key: ACCUMULO-4066
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>    Affects Versions: 1.6.4, 1.7.0
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.6.5, 1.7.1, 1.8.0
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
> When processing conditional mutations tablets reads are done.   The way the current implementation
does tablet reads has a lot of overhead.   For each condition the following is done :
>  * Opens and reserves iterators files.
>  * Parse table iterators from table config (involves scanning and filtering entire table
>  * Merges condition iterators and table iterators
>  * Constructs iterator stack.
> I created a branch where these operations (except for constructing iterator stack) are
done per tablet and/or per batch of conditional mutations.   Doing this I am seeing a 3x speed
up in conditional mutation processing rates when data is cached.

This message was sent by Atlassian JIRA

View raw message