hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish Katyal" <manish.kat...@gmail.com>
Subject Re: Writes - Poor performance on EC2
Date Thu, 21 Aug 2008 18:44:23 GMT
Please see inline:

On Thu, Aug 21, 2008 at 1:32 PM, Jean-Daniel Cryans <jdcryans@gmail.com>wrote:

> Manish,
>
> Some questions:
>
> - Which version of Hadoop/HBase?

0.17.1 and 0.2.0

>
>
> - Which type of EC2 instance?

small

>
>
> - How many regions does your table have at the beginning of the experiment?

I'm not clear about this question - there is no data in the table and thus I
don't know how many regions the table has???



>
> Thx,
>
> J-D
>
> On Thu, Aug 21, 2008 at 2:23 PM, Manish Katyal <manish.katyal@gmail.com
> >wrote:
>
> > By looking at the iostat numbers, it appears the problem is that my data
> is
> > being inserted in the reduce step - as a result only 2 of the region
> > servers
> > (# equal to tasktrackers) are being used at any given time (in fact, are
> > getting slammed while the others are idle).
> > I guess the solution is:
> > - either randomly sort the data so the writes will be performed against
> > different region servers (load balancing). The downside, the writes will
> > take longer.
> > - Or, increase the number of task trackers to be equal to the number of
> > region servers (and hopefully because of the way the input files are
> > split),
> > effectively use all region servers concurrently.
> >
> > Any ideas?
> >
> > - Manish
> >
> > On Thu, Aug 21, 2008 at 10:56 AM, Manish Katyal <manish.katyal@gmail.com
> > >wrote:
> >
> > > I'm running an experiment on EC2 (10 node cluster) that involves
> > inserting
> > > 12 million records (about 1.6GB) of data into HBase. The data is in
> HDFS
> > and
> > > I'm running M/R jobs to write to HBase.
> > > The performance has been very poor - my M/R jobs have been timing out
> > even
> > > though the timeout has been set to 1800 seconds. Were it not for the
> > > timeouts, I estimate it would have taken 10 or 12 hours to insert the
> > data.
> > >
> > > Is this expected performance? Am I doing something wrong here?
> > >
> > > Configuration of the 10 small nodes on EC2:
> > > - 5 Region servers - each running a data node
> > > - 1 dedicated HBase Master Server
> > > - 1 JobTracker server + datanode
> > > - 1 server for Namenode and Secondary namenode
> > > - 2 servers running the Task Trackers and Datanodes
> > >
> > > Any help or directions would be appreciated.
> > >
> > > Thanks,
> > > - Manish Katyal
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message