hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcos Ortiz <mlor...@uci.cu>
Subject Re: column count guidelines
Date Fri, 08 Feb 2013 05:38:20 GMT
My recommendation is to keep updated with the last HBase release, and 
wait for 0.96, which it
has a lot of improvements almost in every area. I talked about this in a 
blog post.[1]

I think in your use-case, Coprocessors can be very helpful, although in 
Lars's "HBase: The Definitive Guide" book,
he explained in Chapter 4 how to use Counters and Coprocessors. You 
should read it.

A great introduction to Coprocessors was posted in HBase's blog, [2] and
a great example of HBase performance tuning, including Coprocessors's 
use, was
posted by Hari Kumar from Ericsson Research on its Data and Knowledge 

Best wishes

[1] http://marcosluis2186.posterous.com/some-upcoming-features-in-hbase-096
[2] https://blogs.apache.org/hbase/entry/coprocessor_introduction
[3] http://labs.ericsson.com/blog/hbase-performance-tuners

On 02/07/2013 11:34 PM, Michael Ellery wrote:
> thanks for reminding me of the HBASE version in CDH4 - that's something we'll definitely
take into consideration.
> -Mike
> On Feb 7, 2013, at 5:09 PM, Ted Yu wrote:
>> Thanks Michael for this information.
>> FYI CDH4 (as of now) is based on HBase 0.92.x which doesn't have the two
>> features I cited below.
>> On Thu, Feb 7, 2013 at 5:02 PM, Michael Ellery <mellery@opendns.com> wrote:
>>> There is only one CF in this schema.
>>> Yes, we are looking at upgrading to CDH4, but it is not trivial since we
>>> cannot have cluster downtime. Our current upgrade plans involves additional
>>> hardware with side-by side clusters until everything is exported/imported.
>>> Thanks,
>>> Mike
>>> On Feb 7, 2013, at 4:34 PM, Ted Yu wrote:
>>>> How many column families are involved ?
>>>> Have you considered upgrading to 0.94.4 where you would be able to
>>> benefit
>>>> from lazy seek, Data Block Encoding, etc ?
>>>> Thanks
>>>> On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <mellery@opendns.com>
>>> wrote:
>>>>> I'm looking for some advice about per row CQ (column qualifier) count
>>>>> guidelines. Our current schema design means we have a HIGHLY variable
>>>>> count per row -- some rows have one or two CQs and some rows have
>>> upwards
>>>>> of 1 million. Each CQ is on the order of 100 bytes (for round numbers)
>>> and
>>>>> the cell values are null.  We see highly variable and too often
>>>>> unacceptable read performance using this schema.  I don't know for a
>>> fact
>>>>> that the CQ count variability is the source of our problems, but I am
>>>>> suspicious.
>>>>> I'm curious about others' experience with CQ counts per row -- are there
>>>>> some best practices/guidelines about how to optimally size the number
>>>>> CQs per row. The other obvious solution will involve breaking this data
>>>>> into finer grained rows, which means shifting from GETs to SCANs - are
>>>>> there performance trade-offs in such a change?
>>>>> We are currently using CDH3u4, if that is relevant. All of our loading
>>> is
>>>>> done via HFILE loading (bulk), so we have not had to tune write
>>> performance
>>>>> beyond using bulk loads. Any advice appreciated, including what metrics
>>> we
>>>>> should be looking at to further diagnose our read performance
>>> challenges.
>>>>> Thanks,
>>>>> Mike Ellery

Marcos Ortiz Valmaseda,
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message