hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne <wav...@gmail.com>
Subject Row+Col Range Read/Scan
Date Wed, 10 Aug 2011 09:39:20 GMT
As we load more and more data into HBase we are seeing the "millions of
columns" to be a challenge for us. We have some very wide rows and we are
taking 12-15 seconds to read those rows. Since HBase does not sort columns
and thereby can not support a scan of columns we are really seeing some
serious limitations to how we can model data in hbase. We always need to
read the entire row thus taking a 15 sec hit.

Is/has there been any talk about building in some support for sorted columns
and the ability to read/scan across columns? Millions of columns are
challenging if you can only read a single column/list of columns or the
entire thing. How does bigtable support this? It seems that hbase is limited
as a column based data store unless it can support this. Our columns are
truly dynamic so we do not even necessarily know what they are to request
them by name in a list. We want to be able to read/scan them just like for

We would love the ability to support the following read method (through
Thrift). We can of course do this on our own from the entire row but it
requires reading the 2 million col row into memory first.

getRowWithColumnRange(tableName, row, startColumn, stopColumn)

The above would be even better if it could be set up like a scanner where we
could stop at any point. Basically instead of scanning rows we would scan
columns for a given row. This would be the best way to support an offset,
limit pattern.

colScanID = colScannerOpenWithStop(tableName, row, startColumn, stopColumn)
colScannerGetList (colSanID,1000)

Of course once these changes occurred people would be pushing the size of
rows even more. We have seen somewhere around 20+ million columns cause OOM
errors. One row per region should be the theoretical limit to the row size,
but there is more work needed I am sure to ensure that this is true.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message