accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (Commented) (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-501) RFile should store the key count in metadata
Date Thu, 29 Mar 2012 12:56:26 GMT


Keith Turner commented on ACCUMULO-501:

Actually RFile already stores this info for each index entry, its just a matter of using it.
 Would be good to piggy back this computation on scan of the index bulk import is already
doing, or have bulk import cache the index if multiple scans are done.  If the inner nodes
of the index tree contain the sum of their children, then the computation can be made faster.
> RFile should store the key count in metadata
> --------------------------------------------
>                 Key: ACCUMULO-501
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.5.0
> BulkImport estimates the number of keys in a file to be zero.  We store the largest and
smallest key in metadata, I think we can afford to store the key count use it to provide an
estimate when we load it into the tablet.  Perhaps if we know the start key is "a" and the
end key is "z" and the tablets range is "a->m" we can just estimate 50% of the key count.
> When a bulk file fits completely in a range, the key count estimate will be accurate.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message