hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Vollmer <a...@evri.com>
Subject Re: Need some help diagnosing poor performance (long)
Date Mon, 15 Sep 2008 15:24:24 GMT
On Sep 14, 2008, at Sep 14, 8:36 AM, Jean-Daniel Cryans wrote:

> Alex,
> How is configured your cluster? Is it like 1 node for
> namenode/jobtracker/master and x nodes as
> datanodes/tasktrackers/regionservers?

We have one machine doing the namenode/jobtracker/master work and four  
nodes performing the datanode/tasktracker/regionserver role.

> You said your table was served by only one regionserver. How many  
> regions do
> you have? From what I understand, since you only have one map, that  
> would
> only make 1 region. This is the less distributed setup HBase can  
> offer so
> you seem to suffer from the full overhead of Hadoop.

That appears to be the case. From my brief surfing through the code it  
looks like the M/R code for HBase won't split mappers at any finer  
grain than by region. I would assume that this is because getting  
finer-grained row-splits would involve actually surfing through all of  
the row, by which point you might as well be mapping in the process.

> Hadoop 0.17 is slower than 0.18. Maybe you should try the 0.18.0 HBase
> release candidate with Hadoop 0.18.0.

Yes, we started with 0.17x somewhat intentionally as we were trying to  
get our heads wrapped around EC2 and Hadoop/HBase. We've since built  
our own custom image so upgrading shouldn't be too tough. I'll look  
into to re-running that test.

> Since your dataset is so small, maybe you should try to lower the  
> split
> threshold by changing the hbase.hregion.max.filesize value (default  
> is 256M,
> look at the hbase-default to know how to configure it). Maybe try a  
> value of
> 64M?

Interesting idea. I'll look into that.

Thank you for your suggestions.


Alex Vollmer
Evri -- Helping users make sense of the world's information
Senior Software Engineer

View raw message