hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish Katyal" <manish.kat...@gmail.com>
Subject Re: Writes - Poor performance on EC2
Date Thu, 21 Aug 2008 19:45:51 GMT
In the FAQ (Advice for smaller clusters in write-heavy
environments<http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200805.mbox/%3C25e5a0c00805072129w3b54599r286940f134c6f235@mail.gmail.com%3E>)
it states
"..so is that until 0.2, when there will be better load balancing on
regionservers, it's always possible that a single region server can be
called on to shoulder the full load of all tasktrackers"-
So load balanced of region servers has not been solved in 0.2.0?
Is this scheduled for a future release?


On Thu, Aug 21, 2008 at 1:50 PM, Jean-Daniel Cryans <jdcryans@gmail.com>wrote:

> You have at least 1 region in a freshly created table. You can see this in
> the web UI.
>
> Performance is very poor when inserting data in a fresh table since there
> is
> only one region. Try doing incremental batches of updates (starting with...
> let's say 100 row) while looking at your number of regions.
>
> Having small instances means having only 1 CPU means also poor
> performances.
> As I said in another thread, Hadoop and HBase are heavily multi threaded.
>
> J-D
>
> On Thu, Aug 21, 2008 at 2:44 PM, Manish Katyal <manish.katyal@gmail.com
> >wrote:
>
> > Please see inline:
> >
> > On Thu, Aug 21, 2008 at 1:32 PM, Jean-Daniel Cryans <jdcryans@gmail.com
> > >wrote:
> >
> > > Manish,
> > >
> > > Some questions:
> > >
> > > - Which version of Hadoop/HBase?
> >
> > 0.17.1 and 0.2.0
> >
> > >
> > >
> > > - Which type of EC2 instance?
> >
> > small
> >
> > >
> > >
> > > - How many regions does your table have at the beginning of the
> > experiment?
> >
> > I'm not clear about this question - there is no data in the table and
> thus
> > I
> > don't know how many regions the table has???
> >
> >
> >
> > >
> > > Thx,
> > >
> > > J-D
> > >
> > > On Thu, Aug 21, 2008 at 2:23 PM, Manish Katyal <
> manish.katyal@gmail.com
> > > >wrote:
> > >
> > > > By looking at the iostat numbers, it appears the problem is that my
> > data
> > > is
> > > > being inserted in the reduce step - as a result only 2 of the region
> > > > servers
> > > > (# equal to tasktrackers) are being used at any given time (in fact,
> > are
> > > > getting slammed while the others are idle).
> > > > I guess the solution is:
> > > > - either randomly sort the data so the writes will be performed
> against
> > > > different region servers (load balancing). The downside, the writes
> > will
> > > > take longer.
> > > > - Or, increase the number of task trackers to be equal to the number
> of
> > > > region servers (and hopefully because of the way the input files are
> > > > split),
> > > > effectively use all region servers concurrently.
> > > >
> > > > Any ideas?
> > > >
> > > > - Manish
> > > >
> > > > On Thu, Aug 21, 2008 at 10:56 AM, Manish Katyal <
> > manish.katyal@gmail.com
> > > > >wrote:
> > > >
> > > > > I'm running an experiment on EC2 (10 node cluster) that involves
> > > > inserting
> > > > > 12 million records (about 1.6GB) of data into HBase. The data is
in
> > > HDFS
> > > > and
> > > > > I'm running M/R jobs to write to HBase.
> > > > > The performance has been very poor - my M/R jobs have been timing
> out
> > > > even
> > > > > though the timeout has been set to 1800 seconds. Were it not for
> the
> > > > > timeouts, I estimate it would have taken 10 or 12 hours to insert
> the
> > > > data.
> > > > >
> > > > > Is this expected performance? Am I doing something wrong here?
> > > > >
> > > > > Configuration of the 10 small nodes on EC2:
> > > > > - 5 Region servers - each running a data node
> > > > > - 1 dedicated HBase Master Server
> > > > > - 1 JobTracker server + datanode
> > > > > - 1 server for Namenode and Secondary namenode
> > > > > - 2 servers running the Task Trackers and Datanodes
> > > > >
> > > > > Any help or directions would be appreciated.
> > > > >
> > > > > Thanks,
> > > > > - Manish Katyal
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message