hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: column count guidelines
Date Fri, 08 Feb 2013 00:34:24 GMT
How many column families are involved ?

Have you considered upgrading to 0.94.4 where you would be able to benefit
from lazy seek, Data Block Encoding, etc ?

Thanks

On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <mellery@opendns.com> wrote:

> I'm looking for some advice about per row CQ (column qualifier) count
> guidelines. Our current schema design means we have a HIGHLY variable CQ
> count per row -- some rows have one or two CQs and some rows have upwards
> of 1 million. Each CQ is on the order of 100 bytes (for round numbers) and
> the cell values are null.  We see highly variable and too often
> unacceptable read performance using this schema.  I don't know for a fact
> that the CQ count variability is the source of our problems, but I am
> suspicious.
>
> I'm curious about others' experience with CQ counts per row -- are there
> some best practices/guidelines about how to optimally size the number of
> CQs per row. The other obvious solution will involve breaking this data
> into finer grained rows, which means shifting from GETs to SCANs - are
> there performance trade-offs in such a change?
>
> We are currently using CDH3u4, if that is relevant. All of our loading is
> done via HFILE loading (bulk), so we have not had to tune write performance
> beyond using bulk loads. Any advice appreciated, including what metrics we
> should be looking at to further diagnose our read performance challenges.
>
> Thanks,
> Mike Ellery

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message