hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Ellery <mell...@opendns.com>
Subject Re: column count guidelines
Date Fri, 08 Feb 2013 04:34:26 GMT

thanks for reminding me of the HBASE version in CDH4 - that's something we'll definitely take
into consideration.

-Mike

On Feb 7, 2013, at 5:09 PM, Ted Yu wrote:

> Thanks Michael for this information.
> 
> FYI CDH4 (as of now) is based on HBase 0.92.x which doesn't have the two
> features I cited below.
> 
> On Thu, Feb 7, 2013 at 5:02 PM, Michael Ellery <mellery@opendns.com> wrote:
> 
>> There is only one CF in this schema.
>> 
>> Yes, we are looking at upgrading to CDH4, but it is not trivial since we
>> cannot have cluster downtime. Our current upgrade plans involves additional
>> hardware with side-by side clusters until everything is exported/imported.
>> 
>> Thanks,
>> Mike
>> 
>> On Feb 7, 2013, at 4:34 PM, Ted Yu wrote:
>> 
>>> How many column families are involved ?
>>> 
>>> Have you considered upgrading to 0.94.4 where you would be able to
>> benefit
>>> from lazy seek, Data Block Encoding, etc ?
>>> 
>>> Thanks
>>> 
>>> On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <mellery@opendns.com>
>> wrote:
>>> 
>>>> I'm looking for some advice about per row CQ (column qualifier) count
>>>> guidelines. Our current schema design means we have a HIGHLY variable CQ
>>>> count per row -- some rows have one or two CQs and some rows have
>> upwards
>>>> of 1 million. Each CQ is on the order of 100 bytes (for round numbers)
>> and
>>>> the cell values are null.  We see highly variable and too often
>>>> unacceptable read performance using this schema.  I don't know for a
>> fact
>>>> that the CQ count variability is the source of our problems, but I am
>>>> suspicious.
>>>> 
>>>> I'm curious about others' experience with CQ counts per row -- are there
>>>> some best practices/guidelines about how to optimally size the number of
>>>> CQs per row. The other obvious solution will involve breaking this data
>>>> into finer grained rows, which means shifting from GETs to SCANs - are
>>>> there performance trade-offs in such a change?
>>>> 
>>>> We are currently using CDH3u4, if that is relevant. All of our loading
>> is
>>>> done via HFILE loading (bulk), so we have not had to tune write
>> performance
>>>> beyond using bulk loads. Any advice appreciated, including what metrics
>> we
>>>> should be looking at to further diagnose our read performance
>> challenges.
>>>> 
>>>> Thanks,
>>>> Mike Ellery
>> 
>> 


Mime
View raw message