accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-1124) optimize index size in RFile
Date Tue, 24 May 2016 04:12:12 GMT


Josh Elser commented on ACCUMULO-1124:

bq. I experimented with shortening keys in the index and that gave some nice improvements,
but not as much as I expected. I realized that even with those changes, bad keys were still
being placed in the index. I added code to keep statistics on key sizes and used those statistics
to try to select keys that were <=AVG(keySize). I also excluded keys that were too big
(greater than 3 std dev from the mean).

I had the thought "how would we determine when index size is efficient" in the future (both
evaluating the success of this change as well as identifying perf issues in the future). Did
you give any thought about how we could expose this information more easily? Maybe we include
some extra information in the file entry in metadata so that the master/monitor could easily
aggregate/report on file statistics? Not suggesting it needs to happen now, but wondering
your thoughts (since I assume you were doing all this investigation by hand).

> optimize index size in RFile
> ----------------------------
>                 Key: ACCUMULO-1124
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Eric Newton
>            Assignee: Keith Turner
>             Fix For: 1.8.0
>          Time Spent: 1h
>  Remaining Estimate: 0h
> I noticed HBASE-7845 and it seems like something we could do in RFile, too.
> Instead of putting the whole key in the index, you put in enough of the key to get the
reader to the beginning of the block.

This message was sent by Atlassian JIRA

View raw message