hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sammy Yu <...@brightedge.com>
Subject Performance of reading rows with a large number of columns
Date Sat, 03 Apr 2010 07:41:16 GMT
   We've been doing some performance comparison between different sets of
schema on HBase-0.20.3.  I have a schema defined as such

table:row1: {
   columfamily:cf1, column:value0001-0100: <cell value>,
   columfamily:cf1, column:value0101-0200: <cell value>,
   columfamily:cf1, column:value0201-0300: <cell value>,

Using the thrift protocol, we are using scannerOpen and limiting it by
specifying just a single column such as cf1:value0101-0200.  This works
really well when row1 just has a single column (0.040 seconds).  However
when a row contains 5,000 columns, the query time jumps up to 1.8 seconds.
Is HBase deserializing the entire row when it reads the data from disk so
limiting the column doesn't have any effect.  Also, is the solution is then
to move the column so that it becomes part of the key?  I think this
solution will work, however it doesn't feel right as there could be cases
where I want value0101-0200 and value0101-0200 to come back in one row.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message