hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: 0.92 Max Row Size
Date Sat, 21 Jan 2012 22:22:00 GMT
On Sat, Jan 21, 2012 at 5:34 AM, Wayne <wav100@gmail.com> wrote:

> Sorry but it would be too hard for us to be able to provide enough info in
> a Jira to accurately reproduce. Our read problem is through thrift and has
> everything to do with the row just being too big to bring back in its
> entirety (13 million col row times out 1/3 of the time). Filters in .92 and
> thrift should help us there. I just closed
> https://issues.apache.org/jira/browse/HBASE-4187 as filters now support
> offset, limit patterns for the get. Of course we would all prefer a
> streaming model to avoid any of these issues and having to build our
> own pseudo streaming model. Is Thrift still the best option for high
> performance python based reads? From Hadoop World it seems some people are
> pushing thrift and others are pushing Avro. Does .92 bundle/work with
> Thrift .8 and are the memory leaks fixed in .8?
For what you are doing, python client, I'd say yes.

> As far as the write bottleneck it has a lot to do with memory, and other
> low level config issues. I would hope that the automated tests of hbase can
> eventually include patterns for large col counts. In order for hbase to
> truly be a col based storage system it needs to scale cols into the 100s
> millions and beyond. This is the pattern we have the hardest time modeling
> in base because there is an unknown "limit" here we have to watch out for.
> There is a known limit that a row must be stored within 1 and only one
> region, but that should not be a problem. One single large region storing
> one large row should still "work".
We don't have such a test in our suite currently.  It would be a good idea
to add it.  I made HBASE-5244.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message