hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: hundreds of reads of a metafile
Date Fri, 09 Apr 2010 18:41:26 GMT
On Thu, Apr 8, 2010 at 2:26 PM, Raghu Angadi <rangadi@apache.org> wrote:
> Regionserver log on node 72 is at : http://bit.ly/cd9acU  (160K gzipped). To
> give a scale of reads, the local datanode had 3.5K reads spread across about
> 20 blocks. Pretty much all the reads were from the same DFSClient ID. will
> watch if this happens again.
>

Thanks for digging in Raghu.

Looking in logs, lots of churn -- other regionservers shedding
regions, restarting?  -- and write load would probably do better if
given more RAM.  You keep hitting the ceiling where we put down a gate
blocking writes until flushes finish.

What time interval are we talking of regards the 3.5k reads across 20
blocks?  Were the 20 blocks under ${HBASE_ROOTDIR}/.META. directory?
This regionserver was carrying the .META. region it looks like so its
going to be popular.  I'd think the cache should be running
interference on i/o but maybe its not doing a good job of it.  The
write load/churn might be blowing the cache.  Yeah, log at DEBUG and
we'll get a better idea.

You re doing big upload?

Cache is a config where you set how much of heap to allocate.  Default
is 0.2 IIRC.
St.Ack





> We are increasing memory from 3GB to 6GB. Any pointers about how to set size
> of block cache will be helpful.
> will enable DEBUG for LruBlockCache.
>
> Raghu.
>
> On Thu, Apr 8, 2010 at 12:46 AM, Stack <stack@duboce.net> wrote:
>
>> On Thu, Apr 8, 2010 at 12:20 AM, Raghu Angadi <rangadi@apache.org> wrote:
>> > Thanks Stack. will move to 20.3 or 20 trunk very soon. more responses
>> inline
>> > below.
>> >
>> Do.  A 0.20.4 should be around in next week or so which will be better
>> still, FYI.
>>
>> > On Wed, Apr 7, 2010 at 8:52 PM, Stack <stack@duboce.net> wrote:
>> >
>> >> On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi <rangadi@apache.org>
>> wrote:
>> >> > We are working with a small HBase cluster (5 nodes) with fairly beefy
>> >> nodes.
>> >> > While looking at why all the regionservers died at one time, noticed
>> that
>> >> > these servers read some files 100s of times a second. This may not
be
>> >> cause
>> >> > of the error... but do you think this is odd?
>> >> >
>> >> Check end of regionserver log.  Should say why RegionServer went away.
>> >>  The usual reason is long GC pause, one that is longer than zk session
>> >> timeout.
>> >
>> >
>> > This seems to be the case... There were CMS GC failures (promotion
>> failed,
>> > Full GC etc). There were 4-5 pauses of about 4-10 seconds over a minute
>> or
>> > so. Is that enough to kill ZK session? We are increasing the memory and
>> will
>> > go through tuning tips on wiki.
>> >
>>
>> ZK session in your 0.20.1 is probably 40 seconds IIRC but yeah, this
>> is common enough until a bit of tuning is done.  If you update to
>> 0.20.3 at least, the zk session is 60 seconds but you should try and
>> avoid the promotion failures if you can.
>>
>> > There are various other errors in the log over couple of hours of RS run.
>> > will post a link to the full log.
>> >
>> > --- failure on RS-72 ---
>> > 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception
>> > closing session 0x127d58da4e70002 to sun.nio.ch.SelectionKeyImpl@426295eb
>> > java.io.IOException: TIMED OUT
>> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
>> > 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception
>> > closing session 0x27d58da6de0088 to sun.nio.ch.SelectionKeyImpl@283f4633
>> > java.io.IOException: TIMED OUT
>> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
>> > 2010-04-06 22:11:07,672 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 27 on 60020, call put([B@20a192c7,
>> > [Lorg.apache.hadoop.hbase.client.Put;@4fab578d) from 10.10.0.72:60211:
>> > error: java.io.IOException: Server not running, aborting
>> > java.io.IOException: Server not running, aborting
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2345)
>> > ------------
>>
>>
>> These look like the zk session time out sequence.
>>
>>
>> >
>> > --- failure on RS-73 after a few minutes ---
>> >
>> > 2010-04-06 22:21:41,867 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>> > -4957903368956265878 lease expired
>> > 2010-04-06 22:21:47,806 WARN org.apache.zookeeper.ClientCnxn: Exception
>> > closing session 0x127d58da4e7002a to sun.nio.ch.SelectionKeyImpl@15ef1241
>> > java.io.IOException: TIMED OUT
>> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
>> > 2010-04-06 22:21:47,806 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > java.lang.OutOfMemoryError: Java heap space
>> > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
>> > at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
>> > -------
>> >
>>
>> This is zk session timeout and an OOME.  The GC couldn't succeed.
>>
>> How much memory you giving these puppies?  CMS is kinda sloppy so you
>> need to give it a bit more space to work in.
>>
>> >  [...]
>> >
>> >> > 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could
>> not
>> >> > obtain block blk_-7610953303919156937_1089667 from any node:
>> >> >  java.io.IOException: No live nodes contain current block
>> >> > [...]
>> >> > ----
>> >>
>> >> Are you accessing from mapreduce?  If so, does your hadoop have
>> hdfs-127?
>> >>
>> >> Then there are the usual suspects.  Xceivers count -- up it to 2k or
>> >> so -- and ulimit should be much greater than the default 1024.
>> >
>> >
>> > yes. Most of the traffic now is puts from reducers. I think HDFS is a
>> recent
>> > Cloudera release. I will check. Most likely it wont have hdfs-127.
>> >
>>
>> My guess is that it does.. but yeah, check (You should remember that
>> one -- smile)
>>
>>
>> > yup.. we hit Xceivers limit very early. The limit is 2k and fd limit is
>> also
>> > high.
>> >
>> > [...]
>> >
>> >> > There are thousands of repeated reads of many small files like this.
>> >> >
>> >> > --- From NN log, this block was created
>> >> > for /hbase/.META./1028785192/info/1728561479703335912
>> >> > 2010-04-06 21:51:20,906 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK*
>> >> > NameSystem.allocateBlock:
>> >> /hbase/.META./1028785192/info/1728561479703335912.
>> >> > blk_8972126557191254374_1090962
>> >> > ----
>> >> >
>> >> > Btw, we had single replication set for this file by mistake.
>> >> >
>> >>
>> >> So, if any error getting a block, there is no recourse.  Was there
>> >> concurrent processes sucking i/o from HDFS running at same time?
>> >
>> >
>> >
>> >> Writing, clients need to figure where to write.  They'll do this by
>> >> doing lookup in .META.  They'll then cache the info.  If clients are
>> >> short-lived, then lots of .META. hits.
>> >>
>> >
>> > Client here is HBase client (in our case reducers)?
>> >
>>
>> Your reducers run for a while?  So clients have chance to exploit
>> cache of meta data info?
>>
>> >
>> >> And as Ryan says, whats the caching stats look like for the .META.
>> >> region?  (See server it was hosted on and check its logs -- we dump
>> >> cache metrics every minute or so).
>> >>
>> >
>> > What should I look for in the log for these stats? I didn't notice stats.
>> I
>> > will send link to  full log.
>> >
>>
>> Looks like this in regionserver log: 2010-04-08 00:43:51,822 DEBUG
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>> Total=6.6769104MB (7001248), Free=807.12305MB (846329888), Max=813.8MB
>> (853331136), Counts: Blocks=0, Access=202589, Hit=0, Miss=202589,
>> Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%,
>> Evicted/Run=NaN
>>
>> You need to enable DEBUG (I'd point you at FAQ for how to do this but
>> you know how).
>>
>> St.Ack
>>
>> > Raghu.
>> >
>> >
>> >> St.Ack
>> >>
>> >
>>
>

Mime
View raw message