hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans" <jdcry...@apache.org>
Subject Re: Need some help diagnosing poor performance (long)
Date Mon, 15 Sep 2008 18:30:11 GMT
Alex,

Yes MR jobs with a table as input will spawn as much Maps as there is
regions in your table (you can see that number in HBase web UI). You are
also assuming right.

J-D

On Mon, Sep 15, 2008 at 11:24 AM, Alex Vollmer <alex@evri.com> wrote:

> On Sep 14, 2008, at Sep 14, 8:36 AM, Jean-Daniel Cryans wrote:
>
>  Alex,
>>
>> How is configured your cluster? Is it like 1 node for
>> namenode/jobtracker/master and x nodes as
>> datanodes/tasktrackers/regionservers?
>>
>
> We have one machine doing the namenode/jobtracker/master work and four
> nodes performing the datanode/tasktracker/regionserver role.
>
>  You said your table was served by only one regionserver. How many regions
>> do
>> you have? From what I understand, since you only have one map, that would
>> only make 1 region. This is the less distributed setup HBase can offer so
>> you seem to suffer from the full overhead of Hadoop.
>>
>
> That appears to be the case. From my brief surfing through the code it
> looks like the M/R code for HBase won't split mappers at any finer grain
> than by region. I would assume that this is because getting finer-grained
> row-splits would involve actually surfing through all of the row, by which
> point you might as well be mapping in the process.
>
>  Hadoop 0.17 is slower than 0.18. Maybe you should try the 0.18.0 HBase
>> release candidate with Hadoop 0.18.0.
>>
>
> Yes, we started with 0.17x somewhat intentionally as we were trying to get
> our heads wrapped around EC2 and Hadoop/HBase. We've since built our own
> custom image so upgrading shouldn't be too tough. I'll look into to
> re-running that test.
>
>  Since your dataset is so small, maybe you should try to lower the split
>> threshold by changing the hbase.hregion.max.filesize value (default is
>> 256M,
>> look at the hbase-default to know how to configure it). Maybe try a value
>> of
>> 64M?
>>
>
> Interesting idea. I'll look into that.
>
> Thank you for your suggestions.
>
> Cheers!
>
>
> --
> Alex Vollmer
> Evri -- Helping users make sense of the world's information
> Senior Software Engineer
> alex@evri.com
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message