hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish Katyal" <manish.kat...@gmail.com>
Subject Re: Writes - Poor performance on EC2
Date Thu, 21 Aug 2008 18:23:54 GMT
By looking at the iostat numbers, it appears the problem is that my data is
being inserted in the reduce step - as a result only 2 of the region servers
(# equal to tasktrackers) are being used at any given time (in fact, are
getting slammed while the others are idle).
I guess the solution is:
- either randomly sort the data so the writes will be performed against
different region servers (load balancing). The downside, the writes will
take longer.
- Or, increase the number of task trackers to be equal to the number of
region servers (and hopefully because of the way the input files are split),
effectively use all region servers concurrently.

Any ideas?

- Manish

On Thu, Aug 21, 2008 at 10:56 AM, Manish Katyal <manish.katyal@gmail.com>wrote:

> I'm running an experiment on EC2 (10 node cluster) that involves inserting
> 12 million records (about 1.6GB) of data into HBase. The data is in HDFS and
> I'm running M/R jobs to write to HBase.
> The performance has been very poor - my M/R jobs have been timing out even
> though the timeout has been set to 1800 seconds. Were it not for the
> timeouts, I estimate it would have taken 10 or 12 hours to insert the data.
>
> Is this expected performance? Am I doing something wrong here?
>
> Configuration of the 10 small nodes on EC2:
> - 5 Region servers - each running a data node
> - 1 dedicated HBase Master Server
> - 1 JobTracker server + datanode
> - 1 server for Namenode and Secondary namenode
> - 2 servers running the Task Trackers and Datanodes
>
> Any help or directions would be appreciated.
>
> Thanks,
> - Manish Katyal
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message