hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HBASE-3551) Loaded hfile indexes occupy a good chunk of heap; look into shrinking the amount used and/or evicting unused indices
Date Thu, 10 Mar 2011 20:08:59 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack resolved HBASE-3551.

    Resolution: Won't Fix

Ok.  Closing.  Will reference your comment Marc over in HBASE-25, etc.  I also added a section
to schema design on size of rows and column family names, keeping them small.  Thanks for
digging in boss.

  <section xml:id="keysize">
      <title>Try to minimize row and column sizes</title>
      <para>In HBase, values are always freighted with their coordinates; as a
          cell value passes through the system, it'll be accompanied by its
          row, column name, and timestamp.  Always.  If your rows and column names
          are large, especially compared o the size of the cell value, then
          you may run up against some interesting scenarios.  One such is
          the case described by Marc Limotte at the tail of
          <link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
          Therein, the indices that are kept on HBase storefiles (<link linkend="hfile">HFile</link>s)
                  to facilitate random access may end up occupyng large chunks of the HBase
                  allotted RAM because the cell value coordinates are large.
                  Mark in the above cited comment suggests upping the block size so
                  entries in the store file index happen at a larger interval or
                  modify the table schema so it makes for smaller rows and column

> Loaded hfile indexes occupy a good chunk of heap; look into shrinking the amount used
and/or evicting unused indices
> --------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-3551
>                 URL: https://issues.apache.org/jira/browse/HBASE-3551
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: stack
> I hung with a user Marc and we were looking over configs and his cluster profile up on
ec2.  One thing we noticed was that his 100+ 1G regions of two families had ~2.5G of heap
resident.  We did a bit of math and couldn't get to 2.5G so that needs looking into.  Even
still, 2.5G is a bunch of heap to give over to indices (He actually OOME'd when he had his
RS heap set to just 3G; we shouldn't OOME, we should just run slower).  It sounds like he
needs the indices loaded but still, for some cases we should drop indices for unaccessed files.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message