accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-1124) optimize index size in RFile
Date Tue, 24 May 2016 14:39:13 GMT


Keith Turner commented on ACCUMULO-1124:

One thing I thought about but did not get to was making rfile-info print some stats about
the index.  Can already calculate the average key size with the info that rfile info prints
out (using num blocks and total index size).  For the histogram option we could print stats
and histogram for index and all data.   Having the histogram information + stats for all keys
and index keys would be really nice for comparing the index to all of the data in the file.

I suspect that before this change larger keys may have had a higher chance of ending up in
the index. Before this change when a data block exceeded the size it would take the last key
in the data block and put it in the index.   Larger keys would push data blocks over the threshold.
 Making rfile-info print out these index vs data stats would show this for older files.  Maybe
I can add that to rfile-info in the PR.

> optimize index size in RFile
> ----------------------------
>                 Key: ACCUMULO-1124
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Eric Newton
>            Assignee: Keith Turner
>             Fix For: 1.8.0
>          Time Spent: 2h
>  Remaining Estimate: 0h
> I noticed HBASE-7845 and it seems like something we could do in RFile, too.
> Instead of putting the whole key in the index, you put in enough of the key to get the
reader to the beginning of the block.

This message was sent by Atlassian JIRA

View raw message