hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans" <jdcry...@apache.org>
Subject Re: Need some help diagnosing poor performance (long)
Date Mon, 15 Sep 2008 18:30:11 GMT

Yes MR jobs with a table as input will spawn as much Maps as there is
regions in your table (you can see that number in HBase web UI). You are
also assuming right.


On Mon, Sep 15, 2008 at 11:24 AM, Alex Vollmer <alex@evri.com> wrote:

> On Sep 14, 2008, at Sep 14, 8:36 AM, Jean-Daniel Cryans wrote:
>  Alex,
>> How is configured your cluster? Is it like 1 node for
>> namenode/jobtracker/master and x nodes as
>> datanodes/tasktrackers/regionservers?
> We have one machine doing the namenode/jobtracker/master work and four
> nodes performing the datanode/tasktracker/regionserver role.
>  You said your table was served by only one regionserver. How many regions
>> do
>> you have? From what I understand, since you only have one map, that would
>> only make 1 region. This is the less distributed setup HBase can offer so
>> you seem to suffer from the full overhead of Hadoop.
> That appears to be the case. From my brief surfing through the code it
> looks like the M/R code for HBase won't split mappers at any finer grain
> than by region. I would assume that this is because getting finer-grained
> row-splits would involve actually surfing through all of the row, by which
> point you might as well be mapping in the process.
>  Hadoop 0.17 is slower than 0.18. Maybe you should try the 0.18.0 HBase
>> release candidate with Hadoop 0.18.0.
> Yes, we started with 0.17x somewhat intentionally as we were trying to get
> our heads wrapped around EC2 and Hadoop/HBase. We've since built our own
> custom image so upgrading shouldn't be too tough. I'll look into to
> re-running that test.
>  Since your dataset is so small, maybe you should try to lower the split
>> threshold by changing the hbase.hregion.max.filesize value (default is
>> 256M,
>> look at the hbase-default to know how to configure it). Maybe try a value
>> of
>> 64M?
> Interesting idea. I'll look into that.
> Thank you for your suggestions.
> Cheers!
> --
> Alex Vollmer
> Evri -- Helping users make sense of the world's information
> Senior Software Engineer
> alex@evri.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message