accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4314) Use statistics to choose better keys for RFile index
Date Wed, 01 Jun 2016 21:36:59 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311175#comment-15311175
] 

Keith Turner commented on ACCUMULO-4314:
----------------------------------------

Another important change to note is the change in depth of the index tree.  In the original
file the tree was 4 levels.  After running with these changes its only 2 levels.  Having less
levels is not just a function of the total index size.  The larger keys tend to make the index
tree deeper.  Avoiding adding larger keys to the index avoids this problem.

> Use statistics to choose better keys for RFile index
> ----------------------------------------------------
>
>                 Key: ACCUMULO-4314
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4314
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>            Priority: Blocker
>             Fix For: 1.6.6, 1.7.2, 1.8.0
>
>
> The commit for ACCUMULO-1124 makes two changes :
>   * Generates shorter keys that may not exist in data to place in RFile index
>   * Use statistics to make better choices about what keys to place in index.  These changes
look for keys that are average or below and excludes large keys (keys that are > 3 std
dev).
> The change to generate shorter keys can not be made in 1.7.X and 1.6.X because it would
generate RFiles that may not work properly with older 1.6 and 1.7 versions.   However the
changes to use statistics to pick better keys could be made in 1.6 and 1.7. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message