hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1071) Set index interval at flush time based off count of keys and key attributes
Date Sat, 20 Dec 2008 12:54:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658282#action_12658282

Andrew Purtell commented on HBASE-1071:

One way to approach this is to estimate the size of the index on the heap by key count and
lengths. Then consider a certain limit, and increase the index interval as necessary until
the estimated index size is below threshold. This is simple and gives only one knob -- easy
enough to tweak -- that gets directly to the effect wanted.  Then a suitable default can be
found through testing of some educated guesses with PE.

> Set index interval at flush time based off count of keys and key attributes
> ---------------------------------------------------------------------------
>                 Key: HBASE-1071
>                 URL: https://issues.apache.org/jira/browse/HBASE-1071
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
> From Andrew Purtell note up on list:
> "Later, maybe it would make sense to dynamically set the index
> interval based on the distribution of cell sizes in the 
> mapfile at some future time, according to some parameterized
> formula that could be adjusted with config variable(s). This
> could be done during compaction. Would make sense to also
> consider the distribution of key lengths. Or there could be
> other similar tricks implemented to keep index sizes down. "

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message