hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject Re: Slow Inserts on EC2 Cluster
Date Thu, 02 Sep 2010 00:56:14 GMT
Good call JD!  We've gone from 20k inserts/minute to 200k. Much
better! I still think it's slower than I'd want by about one OOM, but
it's progress.

Since we're populating 12 families, I guess we're seeking for 12 files
on each write. Not pretty. I'll look at the customer and see if they
really have any sparse data that would benefit from its own
ColumnFamily. Probably not.

Cheers,
B

On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens
<bradfordstephens@gmail.com> wrote:
> Yeah, those families are all needed -- but I didn't realize the files
> were so small. That's odd -- and you're right, that'd certainly throw
> it off. I'll merge them all and see if that helps.
>
> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>> Took a quick look at your RS log, it looks like you are using a lot of
>> families and loading them pretty much at the same rate. Look at lines
>> that start with:
>>
>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ...
>>
>> And you will see that you are dumping very small files on the
>> filesystem, on average 5MB, that together account for ~64MB which is
>> the default flush size (and then it generates tons of compactions
>> which makes it even worse). Do you really need all those families? Try
>> merging them and see the difference.
>>
>> J-D
>>
>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens
>> <bradfordstephens@gmail.com> wrote:
>>> 'allo,
>>>
>>> I changed the cluster form m1.large to c1.xlarge -- we're getting
>>> about 4k inserts /node / minute instead of 2k. A small improvement,
>>> but nowhere near what I'm used to, even from vague memories of old
>>> clusters on EC2.
>>>
>>> I also stripped all the Cascading from my code and have a very basic
>>> raw MR job -- we're basically reading raw text, splitting it into
>>> fields, and adding those rows to HBase. About the simplest task you
>>> could do.
>>>
>>> Ideas for next steps? What other info could I share?
>>>
>>> Cheers,
>>> B
>>>
>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <apurtell@apache.org> wrote:
>>>>> From: Gary Helmling
>>>>>
>>>>> If you're using AMIs based on the latest Ubuntu (10.4),
>>>>> theres a known kernel issue that seems to be causing
>>>>> high loads while idle.  More info here:
>>>>>
>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910
>>>>
>>>> Seems best to avoid using Lucid on EC2 for now, then.
>>>>
>>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI
(with updates). See http://github.com/apurtell/hbase-ec2
>>>>
>>>>  - Andy
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Bradford Stephens,
>>> Founder, Drawn to Scale
>>> drawntoscalehq.com
>>> 727.697.7528
>>>
>>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>>> solution. Process, store, query, search, and serve all your data.
>>>
>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>>> Media, and Computer Science
>>>
>>
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>



-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Mime
View raw message