hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2251) PE defaults to 1k rows - uncommon use case, and easy to hit benchmarks
Date Tue, 23 Feb 2010 21:57:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837464#action_12837464

ryan rawson commented on HBASE-2251:

zipf is ok, but it may not accurate represent common data use patterns.

What I am trying to say here is that big cells represent one scaling challenge, and small
cells a different one.  Users often have one or the other, but not a whole lot inbetween.
 Our systems use either small cells or huge ones ( > 2k).  The small cells place a higher
load, one specific example being the node objects in the memstore kvset.  This is what was
causing the clone issues.

hence we need to accurately simulate objects from the 1-50ish byte size area, and the 1000-12000
(or larger) byte size area.  Using a zipf distribution in each thereof would be reasonable
I think.

> PE defaults to 1k rows - uncommon use case, and easy to hit benchmarks
> ----------------------------------------------------------------------
>                 Key: HBASE-2251
>                 URL: https://issues.apache.org/jira/browse/HBASE-2251
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>             Fix For: 0.20.4, 0.21.0
> The PerformanceEvaluation uses 1k rows, which I would argue is uncommon, and also provides
an easy to hit performance goal.  Most of the harder performance issues happens at the low
and high side of cell size.  In our own application, our key sizes range from 4 bytes to maybe
100 bytes.  Very rarely 1000 bytes.  If we have large values, they are VERY large, like multiple
k sizes.
> Recently a change went into HBase that ran well with PE because the overhead of 1k rows
is very low in memory, but under small rows, the expected performance would be hit much more.
 This is because the per-value overhead (eg: node objects of the skip list/memstore) is amortized
more with 1k values. 
> We should make this a tunable setting, and have a low default.  I would argue for a 10-30
byte default.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message