Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 40624 invoked from network); 9 Apr 2010 18:41:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Apr 2010 18:41:54 -0000 Received: (qmail 49399 invoked by uid 500); 9 Apr 2010 18:41:53 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 49369 invoked by uid 500); 9 Apr 2010 18:41:53 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 49361 invoked by uid 99); 9 Apr 2010 18:41:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Apr 2010 18:41:53 +0000 X-ASF-Spam-Status: No, hits=-0.6 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com designates 209.85.221.193 as permitted sender) Received: from [209.85.221.193] (HELO mail-qy0-f193.google.com) (209.85.221.193) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Apr 2010 18:41:48 +0000 Received: by qyk31 with SMTP id 31so5802533qyk.20 for ; Fri, 09 Apr 2010 11:41:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:received:message-id:subject :from:to:content-type:content-transfer-encoding; bh=rLIV+6zzVtnSNaPZvkDVVHcQsQcWMxvGc57gVOIhOlU=; b=xeTFUelf/1P6MR8UtPKjVYMmXiDNDgIVe/Sb3RHRpk6+BDffokCJ5qNEvhyv6KDOyG +jypS5zXA0uwZn4GHjrOkM5E1SkwwOF6ljFtPG69yK72K/mVEkRxo53mKbfxxE1Zo+oC Hw+/nJ0557bmhO5Ov0DscvuSTLSK60l6l3mkM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=Ru2H5KyJHIrUkfLACoCIrYjeBVXydXu2W1PAC1v+un6AjQsD45PfouUu20EROe77gh 4HWJ0Mv1mvHbcfrdJW0ov8vNWCdSnsRemHZKs/lz1Mdelfc07U2UsSucaSLCtPw6qVBC qNaKdkk8eDFyXMWY/ogOP9iVcjeUBXa2/Dt1Y= MIME-Version: 1.0 Sender: saint.ack@gmail.com Received: by 10.229.10.133 with HTTP; Fri, 9 Apr 2010 11:41:26 -0700 (PDT) In-Reply-To: References: Date: Fri, 9 Apr 2010 11:41:26 -0700 X-Google-Sender-Auth: b53f42cbf5ec80dc Received: by 10.229.212.4 with SMTP id gq4mr511266qcb.62.1270838486779; Fri, 09 Apr 2010 11:41:26 -0700 (PDT) Message-ID: Subject: Re: hundreds of reads of a metafile From: Stack To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Thu, Apr 8, 2010 at 2:26 PM, Raghu Angadi wrote: > Regionserver log on node 72 is at : http://bit.ly/cd9acU =A0(160K gzipped= ). To > give a scale of reads, the local datanode had 3.5K reads spread across ab= out > 20 blocks. Pretty much all the reads were from the same DFSClient ID. wil= l > watch if this happens again. > Thanks for digging in Raghu. Looking in logs, lots of churn -- other regionservers shedding regions, restarting? -- and write load would probably do better if given more RAM. You keep hitting the ceiling where we put down a gate blocking writes until flushes finish. What time interval are we talking of regards the 3.5k reads across 20 blocks? Were the 20 blocks under ${HBASE_ROOTDIR}/.META. directory? This regionserver was carrying the .META. region it looks like so its going to be popular. I'd think the cache should be running interference on i/o but maybe its not doing a good job of it. The write load/churn might be blowing the cache. Yeah, log at DEBUG and we'll get a better idea. You re doing big upload? Cache is a config where you set how much of heap to allocate. Default is 0.2 IIRC. St.Ack > We are increasing memory from 3GB to 6GB. Any pointers about how to set s= ize > of block cache will be helpful. > will enable DEBUG for LruBlockCache. > > Raghu. > > On Thu, Apr 8, 2010 at 12:46 AM, Stack wrote: > >> On Thu, Apr 8, 2010 at 12:20 AM, Raghu Angadi wrote= : >> > Thanks Stack. will move to 20.3 or 20 trunk very soon. more responses >> inline >> > below. >> > >> Do. =A0A 0.20.4 should be around in next week or so which will be better >> still, FYI. >> >> > On Wed, Apr 7, 2010 at 8:52 PM, Stack wrote: >> > >> >> On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi >> wrote: >> >> > We are working with a small HBase cluster (5 nodes) with fairly bee= fy >> >> nodes. >> >> > While looking at why all the regionservers died at one time, notice= d >> that >> >> > these servers read some files 100s of times a second. This may not = be >> >> cause >> >> > of the error... but do you think this is odd? >> >> > >> >> Check end of regionserver log. =A0Should say why RegionServer went aw= ay. >> >> =A0The usual reason is long GC pause, one that is longer than zk sess= ion >> >> timeout. >> > >> > >> > This seems to be the case... There were CMS GC failures (promotion >> failed, >> > Full GC etc). There were 4-5 pauses of about 4-10 seconds over a minut= e >> or >> > so. Is that enough to kill ZK session? We are increasing the memory an= d >> will >> > go through tuning tips on wiki. >> > >> >> ZK session in your 0.20.1 is probably 40 seconds IIRC but yeah, this >> is common enough until a bit of tuning is done. =A0If you update to >> 0.20.3 at least, the zk session is 60 seconds but you should try and >> avoid the promotion failures if you can. >> >> > There are various other errors in the log over couple of hours of RS r= un. >> > will post a link to the full log. >> > >> > --- failure on RS-72 --- >> > 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exceptio= n >> > closing session 0x127d58da4e70002 to sun.nio.ch.SelectionKeyImpl@42629= 5eb >> > java.io.IOException: TIMED OUT >> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) >> > 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exceptio= n >> > closing session 0x27d58da6de0088 to sun.nio.ch.SelectionKeyImpl@283f46= 33 >> > java.io.IOException: TIMED OUT >> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) >> > 2010-04-06 22:11:07,672 INFO org.apache.hadoop.ipc.HBaseServer: IPC >> Server >> > handler 27 on 60020, call put([B@20a192c7, >> > [Lorg.apache.hadoop.hbase.client.Put;@4fab578d) from 10.10.0.72:60211: >> > error: java.io.IOException: Server not running, aborting >> > java.io.IOException: Server not running, aborting >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServ= er.java:2345) >> > ------------ >> >> >> These look like the zk session time out sequence. >> >> >> > >> > --- failure on RS-73 after a few minutes --- >> > >> > 2010-04-06 22:21:41,867 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner >> > -4957903368956265878 lease expired >> > 2010-04-06 22:21:47,806 WARN org.apache.zookeeper.ClientCnxn: Exceptio= n >> > closing session 0x127d58da4e7002a to sun.nio.ch.SelectionKeyImpl@15ef1= 241 >> > java.io.IOException: TIMED OUT >> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) >> > 2010-04-06 22:21:47,806 ERROR >> > org.apache.hadoop.hbase.regionserver.HRegionServer: >> > java.lang.OutOfMemoryError: Java heap space >> > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39) >> > at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) >> > ------- >> > >> >> This is zk session timeout and an OOME. =A0The GC couldn't succeed. >> >> How much memory you giving these puppies? =A0CMS is kinda sloppy so you >> need to give it a bit more space to work in. >> >> > =A0[...] >> > >> >> > 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Coul= d >> not >> >> > obtain block blk_-7610953303919156937_1089667 from any node: >> >> > =A0java.io.IOException: No live nodes contain current block >> >> > [...] >> >> > ---- >> >> >> >> Are you accessing from mapreduce? =A0If so, does your hadoop have >> hdfs-127? >> >> >> >> Then there are the usual suspects. =A0Xceivers count -- up it to 2k o= r >> >> so -- and ulimit should be much greater than the default 1024. >> > >> > >> > yes. Most of the traffic now is puts from reducers. I think HDFS is a >> recent >> > Cloudera release. I will check. Most likely it wont have hdfs-127. >> > >> >> My guess is that it does.. but yeah, check (You should remember that >> one -- smile) >> >> >> > yup.. we hit Xceivers limit very early. The limit is 2k and fd limit i= s >> also >> > high. >> > >> > [...] >> > >> >> > There are thousands of repeated reads of many small files like this= . >> >> > >> >> > --- From NN log, this block was created >> >> > for /hbase/.META./1028785192/info/1728561479703335912 >> >> > 2010-04-06 21:51:20,906 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* >> >> > NameSystem.allocateBlock: >> >> /hbase/.META./1028785192/info/1728561479703335912. >> >> > blk_8972126557191254374_1090962 >> >> > ---- >> >> > >> >> > Btw, we had single replication set for this file by mistake. >> >> > >> >> >> >> So, if any error getting a block, there is no recourse. =A0Was there >> >> concurrent processes sucking i/o from HDFS running at same time? >> > >> > >> > >> >> Writing, clients need to figure where to write. =A0They'll do this by >> >> doing lookup in .META. =A0They'll then cache the info. =A0If clients = are >> >> short-lived, then lots of .META. hits. >> >> >> > >> > Client here is HBase client (in our case reducers)? >> > >> >> Your reducers run for a while? =A0So clients have chance to exploit >> cache of meta data info? >> >> > >> >> And as Ryan says, whats the caching stats look like for the .META. >> >> region? =A0(See server it was hosted on and check its logs -- we dump >> >> cache metrics every minute or so). >> >> >> > >> > What should I look for in the log for these stats? I didn't notice sta= ts. >> I >> > will send link to =A0full log. >> > >> >> Looks like this in regionserver log: 2010-04-08 00:43:51,822 DEBUG >> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >> Total=3D6.6769104MB (7001248), Free=3D807.12305MB (846329888), Max=3D813= .8MB >> (853331136), Counts: Blocks=3D0, Access=3D202589, Hit=3D0, Miss=3D202589= , >> Evictions=3D0, Evicted=3D0, Ratios: Hit Ratio=3D0.0%, Miss Ratio=3D100.0= %, >> Evicted/Run=3DNaN >> >> You need to enable DEBUG (I'd point you at FAQ for how to do this but >> you know how). >> >> St.Ack >> >> > Raghu. >> > >> > >> >> St.Ack >> >> >> > >> >