hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " Marcos Ortiz Valmaseda" <mlor...@uci.cu>
Subject Re: column count guidelines
Date Fri, 08 Feb 2013 01:08:25 GMT
I have the same advice that Ted Yu said to you. 
You should upgrade to 0.94.4. There are a lot of good things which can be very 
benefitial for your use-case. 

----- Mensaje original -----

De: "Michael Ellery" <mellery@opendns.com> 
Para: user@hbase.apache.org 
Enviados: Jueves, 7 de Febrero 2013 20:02:18 
Asunto: Re: column count guidelines 

There is only one CF in this schema. 

Yes, we are looking at upgrading to CDH4, but it is not trivial since we cannot have cluster
downtime. Our current upgrade plans involves additional hardware with side-by side clusters
until everything is exported/imported. 


On Feb 7, 2013, at 4:34 PM, Ted Yu wrote: 

> How many column families are involved ? 
> Have you considered upgrading to 0.94.4 where you would be able to benefit 
> from lazy seek, Data Block Encoding, etc ? 
> Thanks 
> On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <mellery@opendns.com> wrote: 
>> I'm looking for some advice about per row CQ (column qualifier) count 
>> guidelines. Our current schema design means we have a HIGHLY variable CQ 
>> count per row -- some rows have one or two CQs and some rows have upwards 
>> of 1 million. Each CQ is on the order of 100 bytes (for round numbers) and 
>> the cell values are null. We see highly variable and too often 
>> unacceptable read performance using this schema. I don't know for a fact 
>> that the CQ count variability is the source of our problems, but I am 
>> suspicious. 
>> I'm curious about others' experience with CQ counts per row -- are there 
>> some best practices/guidelines about how to optimally size the number of 
>> CQs per row. The other obvious solution will involve breaking this data 
>> into finer grained rows, which means shifting from GETs to SCANs - are 
>> there performance trade-offs in such a change? 
>> We are currently using CDH3u4, if that is relevant. All of our loading is 
>> done via HFILE loading (bulk), so we have not had to tune write performance 
>> beyond using bulk loads. Any advice appreciated, including what metrics we 
>> should be looking at to further diagnose our read performance challenges. 
>> Thanks, 
>> Mike Ellery 


Marcos Ortiz Valmaseda, 
Product Manager && Data Scientist at UCI 
Blog : http://marcosluis2186.posterous.com 
LinkedIn: http://www.linkedin.com/in/marcosluis2186 
Twitter : @marcosluis2186 

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message