hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans" <jdcry...@gmail.com>
Subject Re: Writes - Poor performance on EC2
Date Thu, 21 Aug 2008 18:32:18 GMT
Manish,

Some questions:

- Which version of Hadoop/HBase?

- Which type of EC2 instance?

- How many regions does your table have at the beginning of the experiment?

Thx,

J-D

On Thu, Aug 21, 2008 at 2:23 PM, Manish Katyal <manish.katyal@gmail.com>wrote:

> By looking at the iostat numbers, it appears the problem is that my data is
> being inserted in the reduce step - as a result only 2 of the region
> servers
> (# equal to tasktrackers) are being used at any given time (in fact, are
> getting slammed while the others are idle).
> I guess the solution is:
> - either randomly sort the data so the writes will be performed against
> different region servers (load balancing). The downside, the writes will
> take longer.
> - Or, increase the number of task trackers to be equal to the number of
> region servers (and hopefully because of the way the input files are
> split),
> effectively use all region servers concurrently.
>
> Any ideas?
>
> - Manish
>
> On Thu, Aug 21, 2008 at 10:56 AM, Manish Katyal <manish.katyal@gmail.com
> >wrote:
>
> > I'm running an experiment on EC2 (10 node cluster) that involves
> inserting
> > 12 million records (about 1.6GB) of data into HBase. The data is in HDFS
> and
> > I'm running M/R jobs to write to HBase.
> > The performance has been very poor - my M/R jobs have been timing out
> even
> > though the timeout has been set to 1800 seconds. Were it not for the
> > timeouts, I estimate it would have taken 10 or 12 hours to insert the
> data.
> >
> > Is this expected performance? Am I doing something wrong here?
> >
> > Configuration of the 10 small nodes on EC2:
> > - 5 Region servers - each running a data node
> > - 1 dedicated HBase Master Server
> > - 1 JobTracker server + datanode
> > - 1 server for Namenode and Secondary namenode
> > - 2 servers running the Task Trackers and Datanodes
> >
> > Any help or directions would be appreciated.
> >
> > Thanks,
> > - Manish Katyal
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message