hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Slow Inserts on EC2 Cluster
Date Thu, 02 Sep 2010 00:54:44 GMT
There are a couple of things here happening, and some solutions:

- dont flush based on region size, only on family/store size.
- do what the bigtable paper says and merge the smallest file with
memstore while flushing thus keeping the net number of files low.

The latter would probably benefit from the use of the block cache in
some situations as well.



On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens
<bradfordstephens@gmail.com> wrote:
> Yeah, those families are all needed -- but I didn't realize the files
> were so small. That's odd -- and you're right, that'd certainly throw
> it off. I'll merge them all and see if that helps.
>
> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>> Took a quick look at your RS log, it looks like you are using a lot of
>> families and loading them pretty much at the same rate. Look at lines
>> that start with:
>>
>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ...
>>
>> And you will see that you are dumping very small files on the
>> filesystem, on average 5MB, that together account for ~64MB which is
>> the default flush size (and then it generates tons of compactions
>> which makes it even worse). Do you really need all those families? Try
>> merging them and see the difference.
>>
>> J-D
>>
>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens
>> <bradfordstephens@gmail.com> wrote:
>>> 'allo,
>>>
>>> I changed the cluster form m1.large to c1.xlarge -- we're getting
>>> about 4k inserts /node / minute instead of 2k. A small improvement,
>>> but nowhere near what I'm used to, even from vague memories of old
>>> clusters on EC2.
>>>
>>> I also stripped all the Cascading from my code and have a very basic
>>> raw MR job -- we're basically reading raw text, splitting it into
>>> fields, and adding those rows to HBase. About the simplest task you
>>> could do.
>>>
>>> Ideas for next steps? What other info could I share?
>>>
>>> Cheers,
>>> B
>>>
>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <apurtell@apache.org> wrote:
>>>>> From: Gary Helmling
>>>>>
>>>>> If you're using AMIs based on the latest Ubuntu (10.4),
>>>>> theres a known kernel issue that seems to be causing
>>>>> high loads while idle.  More info here:
>>>>>
>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910
>>>>
>>>> Seems best to avoid using Lucid on EC2 for now, then.
>>>>
>>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI
(with updates). See http://github.com/apurtell/hbase-ec2
>>>>
>>>>  - Andy
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Bradford Stephens,
>>> Founder, Drawn to Scale
>>> drawntoscalehq.com
>>> 727.697.7528
>>>
>>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>>> solution. Process, store, query, search, and serve all your data.
>>>
>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>>> Media, and Computer Science
>>>
>>
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>

Mime
View raw message