hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Problems with write performance (25kb rows)
Date Fri, 15 Jan 2010 18:48:51 GMT
Your pngs of traffic didn't come across.  Please put them somewhere I can
pull.

On Fri, Jan 15, 2010 at 5:40 AM, Dmitriy Lyfar <dlyfar@gmail.com> wrote:

>
> After some night tests I have log of one regionserver in debug mode.
> I've uploaded it here: http://slil.ru/28491882 (downloading begins after
> 10
> second)
>

Thats an interesting site Dmitry (smile).



> But there is some problems I see after these tests, I regularly have
> following exception in client logs:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server, retryOnlyOne=true, index=0, islastrow=false,
> tries=9, numtries=10, i=179, listsize=883,
> region=4,\x00\x00F\x16,1263403845332 for region 4,\x00\x00E5,1263403845332,
> row '\x00\x00E\xA2', but failed after 10 attempts.
>
>
Your log is interesting.  I see a bunch of this:

2010-01-15 14:17:39,064 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of
0,\x00\x01T\xE5,1263450046617 because global memstore limit of 1.9g
exceeded; currently 1.8g and flushing till 1.2g

Which would explain some of your slowness.

That you have 800 regions per server likely makes the above happen more
frequently that it should... this and your randomized keys.  The latter are
probably putting little pieces into each of the regions making it harder for
a good fat flush to happen to free up the above.

I also see forced flushing happening because you have "too many log files".
 My guess this latter is a new phenomeon because of the randomized keys.
 You are running hbase 0.20.2 or the head of the 0.20 branch.  The latter
might help with the log issue.  This issue shouldn't get in the way of
slowing your servers.  That'd be the former issue.


>
> But I see that all servers are online. I can only suppose that sometimes
> there is insufficient number of RPC handlers. Also I would like to ask how
> replication in hadoop works. You can see in pictures from previous post
> that
> inbound traffic = outbound for server under load. Is that mean that hadoop
> creates replication for block on another server as we wrote this block on
> current server? Is there any influence of replication on read/write speed
> (I
> mean is there any case when replications impacts on network throughput and
> read/write operations became slower)?
>


Yes.  Hadoop replicates as you write.


Do you need 800 regions per server?  You might want to up the size of your
regions... make them 1G regions rather than 256M.  It would depend on your
write rate.

Let me get back to you.  I have to go at the moment.  This log is
interesting.  I want to look at it more.

St.Ack


>
> 2010/1/14 Dmitriy Lyfar <dlyfar@gmail.com>
>
> > Hi,
> >
> > > Speed still the same (about 1K rows per second).
> >> >
> >>
> >> This seems low for your 6 node cluster.
> >>
> >> If you look at the servers, are they cpu or io bound-up in any way?
> >>
> >> How many clients you have running now?
> >>
> >
> > Now I'm running 1-2 clients in parallel. If I run more -- timings grows.
> > Also I not use namenode as datanode and as regionserver. There is only
> > namenode/secondarynn/master/zk.
> >
> >
> >>
> >> This is not a new table right?  (I see there is an existing table in
> your
> >> cluster looking at the regionserver log).   Its an existing table of
> many
> >> regions?
> >>
> >
> > Yes. I have 7 test tables. Client randomly select table which will be
> used
> > at start.
> > Now after some tests I have about 800 regions per region server and 7
> > tables.
> >
> >
> >>
> >> You have upped the handlers in hbase.  Have you done same for datanodes
> >> (In
> >> case we are bottlenecking here).
> >>
> >
> > I've updated this setting for hadoop also. As I understand if something
> > wrong with
> > number of handles -- I will get an exception TooManyOpenFiles and
> datanode
> > finish its work.
> > All works fine for now. I've attached metrics from one of datanodes. On
> > other nodes we have almost same picture. Please look at the throughput
> > picture. It seems illogical to me that node have almost equal inbound and
> > outbound traffic (render.png). These pictures were snapped while running
> two
> > clients and then after some break I've ran one client.
> >
> >
> >>  > Random ints plays a role of row keys now (i.e. uniform random
> >> distribution
> >> > on (0, 100 * 1000)).
> >> > What do you think is 5GB for hbase and 2GB for hdfs enough?
> >> >
> >> > Yes, that should be good.  Writing you are not using that memory in
> >> regionserver though, maybe you should go with bigger regions if you have
> >> 25k
> >> cells.  You using compression?
> >>
> >
> > Yes, 25Kb is important, but I think in production system we will have
> > 70-80% of 5-10Kb rows,
> > about 20% of 25Kb rows and 10% of > 25Kb rows. I'm not using any
> > compression for columns because I was thinking about throughput. But I
> was
> > planning to use compression when I can achieve 80-90 Mb/sec for this
> test.
> >
> >
> >>
> >> I took a look at your regionserver log.  Its just after an open of the
> >> regionserver.  I see no activity other than the opening of a few
> regions.
> >>  These regions do happen to have alot of store files so we're starting
> up
> >> compactions but that all should be fine.  I'd be interested in seeing a
> >> log
> >> snippet from a regionserver under load.
> >>
> >
> > Ok, there are some tests running now which will be interesting I think,
> > I'll provide regionserver logs a bit later.
> > Thank you for your help!
> >
> > --
> > Regards, Lyfar Dmitriy
> >
> >
>
>
> --
> Regards, Lyfar Dmitriy
> mailto: dlyfar@crystalnix.com
> jabber: dlyfar@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message