hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Blob storage
Date Wed, 09 Mar 2011 22:13:47 GMT
Yeah there's definitely something better we could do there, see "Too
easy to OOME a RS" https://issues.apache.org/jira/browse/HBASE-2506


On Wed, Mar 9, 2011 at 11:09 AM, Chris Tarnas <cft@email.com> wrote:
> When I get a chance to catch my breath I'll see about writing up something on our experiences.
One thing I will say - don't skimp on the nodes, you do not want to run out of RAM when using
the large values. When running my dev environment in pseudo distributed mode on a laptop the
system can have trouble (nothing that is not recoverable though) when the regionserver runs
out of memory dealing with the large value.
> -chris
> On Mar 8, 2011, at 1:54 PM, Jean-Daniel Cryans wrote:
>> That's pretty good stuff Chris! You know, you could be my new BFF if
>> you wrote a blog post about your current HBase setup, experiences, etc
>> :)
>> J-D
>> On Tue, Mar 8, 2011 at 11:25 AM, Chris Tarnas <cft@email.com> wrote:
>>> Yes, HBASE-3483 fixed the majority of our pauses, but not all - as JD points
out we do experience issues related to inserting into several column families. Luckily inserts
that have the really imbalanced column family sizes (mb vs kb) are few and far between, relatively
speaking. We are also "throttled" by going through thrift, but even then I can push our 10
node cluster to over 200k requests a second.
>>> -chris
>>> On Mar 8, 2011, at 11:16 AM, Ryan Rawson wrote:
>>>> Probably the soft limit flushes, eh?
>>>> On Mar 8, 2011 11:15 AM, "Jean-Daniel Cryans" <jdcryans@apache.org>
>>>>> On Tue, Mar 8, 2011 at 11:04 AM, Chris Tarnas <cft@email.com> wrote:
>>>>>> Just as a point of reference, in one of our systems we have 500+million
>>>> rows that have a cell in its own column family that is about usually about
>>>> 100bytes, but in about 10,000 of rows the cell can get to 300mb (average
>>>> probably about 30mb for the larger data). The jumbo sized data gets loaded
>>>> in separately from the smaller data, although it all goes through the same
>>>> pipeline. We are using cdh3b45 (0.90.1) GZ compression, region size of 1GB
>>>> and with a max value size of 500mb. So far we have had no problems with the
>>>> larger values.
>>>>>> Our largest problem was performance related to inserting into several
>>>> column families for the small sized value loads and pauses when flushing
>>>> memstores. 0.90.1 helped quite a bit with that.
>>>>> Flushing is done without blocking, were the pauses you were seeing
>>>>> related to the "too many stores" issue or about the global memstore
>>>>> size?
>>>>> In general inserting into many families is a bad idea unless the sizes
>>>>> are the same. The worst case is inserting a few kbs in one and a few
>>>>> mbs in the other. The reason being:
>>>>> https://issues.apache.org/jira/browse/HBASE-3149
>>>>> J-D

View raw message