hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Ellery <mell...@opendns.com>
Subject column count guidelines
Date Thu, 07 Feb 2013 23:47:57 GMT
I'm looking for some advice about per row CQ (column qualifier) count guidelines. Our current
schema design means we have a HIGHLY variable CQ count per row -- some rows have one or two
CQs and some rows have upwards of 1 million. Each CQ is on the order of 100 bytes (for round
numbers) and the cell values are null.  We see highly variable and too often unacceptable
read performance using this schema.  I don't know for a fact that the CQ count variability
is the source of our problems, but I am suspicious. 

I'm curious about others' experience with CQ counts per row -- are there some best practices/guidelines
about how to optimally size the number of CQs per row. The other obvious solution will involve
breaking this data into finer grained rows, which means shifting from GETs to SCANs - are
there performance trade-offs in such a change?

We are currently using CDH3u4, if that is relevant. All of our loading is done via HFILE loading
(bulk), so we have not had to tune write performance beyond using bulk loads. Any advice appreciated,
including what metrics we should be looking at to further diagnose our read performance challenges.

Mike Ellery
View raw message