Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80E5298C2 for ; Sat, 11 Aug 2012 06:43:49 +0000 (UTC) Received: (qmail 57309 invoked by uid 500); 11 Aug 2012 06:43:48 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 55370 invoked by uid 500); 11 Aug 2012 06:43:44 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 55180 invoked by uid 99); 11 Aug 2012 06:43:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Aug 2012 06:43:43 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_FREEMAIL_1,FSL_FREEMAIL_2,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.229.176] (HELO nm40-vm0.bullet.mail.ne1.yahoo.com) (98.138.229.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Aug 2012 06:43:36 +0000 Received: from [98.138.90.49] by nm40.bullet.mail.ne1.yahoo.com with NNFMP; 11 Aug 2012 06:43:14 -0000 Received: from [98.138.89.172] by tm2.bullet.mail.ne1.yahoo.com with NNFMP; 11 Aug 2012 06:43:14 -0000 Received: from [127.0.0.1] by omp1028.mail.ne1.yahoo.com with NNFMP; 11 Aug 2012 06:43:13 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 988222.14636.bm@omp1028.mail.ne1.yahoo.com Received: (qmail 60926 invoked by uid 60001); 11 Aug 2012 06:43:13 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1344667393; bh=Xh5GBkfzGVjFyts1rPYKhnpN97SR+BNt3dI3En44m6M=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=mX4V217Nfn0HLlnfSi1p4fqmT/v0az92VfXQJ4WxVrACPXeVn50Gej1ewgadwIiKFonfWdPbCOUn6hBcHo5gfRkC54wqk0Eye5JScWYKYVDRGIjn0uFBkXY5H/eYU7AXIzIc8Tgso6QHGawe1H9VnWqwSz9U5EnAJiQvFIT/lIQ= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=H6Ck2N+C0XeKjfQZfhZJLudbdbT+2IebEM2I4v8dMpbCdtaCdgyeHcbbUnUOG7tfRkZ5PmT03ySS8UJ98B/4zdd689qZ5HxCZPHYRfVOL03M5LGAEyaG2I7Q2HWYs8f7Sid87F3yl9Y29DloE8DRqHXCvoE+ZhioUxjbfdZWLcU=; X-YMail-OSG: 1koPMnkVM1kiar6zRlzM70vQ58VppdX7xF5jXiC0FsDWR.s _M.m1JdEG4cQ_F1LY8Ts0pfVfHhijGmewgW0R.9TyoSzlVnptTHu24wTx5lk Fyva5jOQO_yVKxxomoerqy8r_Ap2P2nNQv98IhuU7gMTAwXCwGba8.CXnMcz jck24nDszpYWDtHSZM3d.x8gMvNVDUEVlg8oXkJI_rKCYZCNii5fcpXMvep5 DhoBLhdoU6PFhZvnMZgE7yu3fK2G.mMzYRcd7RdQbleMn9qyWseiauoIXI0h H6C5DZUg5ny5BqfLb2sN.mE9IeAsOdHyrff39_XmqZ0UUQkGpQE90OFWqsTD vcpKP2GEoNbGtCLjHI_pJM_d_aQ1kPopf8xi2Fu1ZzriZtKQVCqX1EyopxQS LGsfa3X0guptAtkoFoQq.jJJhYMq_lbHJDBcerYCXW3DuengbW.49VSB309a jX5EWnCkLMVFz.vVv3F.SgXLGvkM2RWVnErgsg70XHBf.CtZ189ANiaJekZ0 Rpb2kgo4w7NocRDVhZXpacK5Y7ncvebjRjGxtc92U6xJ7XJk9 Received: from [24.4.207.85] by web121706.mail.ne1.yahoo.com via HTTP; Fri, 10 Aug 2012 23:43:13 PDT X-Mailer: YahooMailWebService/0.8.120.356233 References: <52E301F960B30049ADEFBCCF1CCAEF59111989E3@OAEXCH4SERVER.oa.oclc.org> <6548F17059905B48B2A6F28CE3692BAA0CE29691@OAEXCH4SERVER.oa.oclc.org> Message-ID: <1344667393.56061.YahooMailNeo@web121706.mail.ne1.yahoo.com> Date: Fri, 10 Aug 2012 23:43:13 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Extremely long flush times To: "user@hbase.apache.org" , hbase-dev In-Reply-To: <6548F17059905B48B2A6F28CE3692BAA0CE29691@OAEXCH4SERVER.oa.oclc.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Ran your test code (thanks Carlos).=0A=0AFound two things:=0A1. Store.inter= nalFlushCache(...) should be calling StoreScanner.next(List, int = limit) - currently it does not set a limit.(But this is not the problem).= =0A=0A2. With jstack I found that the code is stuck in a loop in Memstore.M= emstoreScanner.getNext(...)=0A=0AHere's the relevant part of the jstack:=0A= "IPC Server handler 6 on 60020" daemon prio=3D10 tid=3D0x00007f0574625000 n= id=3D0x720c runnable [0x00007f05669e7000]=0A=A0=A0 java.lang.Thread.State: = RUNNABLE=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.Me= mStore$MemStoreScanner.getNext(MemStore.java:726)=0A=A0=A0=A0=A0=A0=A0=A0 a= t org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seekInSubLi= sts(MemStore.java:761)=0A=A0=A0=A0=A0=A0=A0=A0 - locked <0x00000000c4a8a860= > (a org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner)=0A=A0= =A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.MemStore$MemStor= eScanner.reseek(MemStore.java:800)=0A=A0=A0=A0=A0=A0=A0=A0 - locked <0x0000= 0000c4a8a860> (a org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScan= ner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.NonLaz= yKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)=0A=A0=A0=A0=A0= =A0=A0=A0 at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedS= eek(KeyValueHeap.java:299)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hba= se.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244)=0A=A0=A0=A0=A0= =A0=A0=A0 at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(Store= Scanner.java:522)=0A=A0=A0=A0=A0=A0=A0=A0 - eliminated <0x00000000ccb54860>= (a org.apache.hadoop.hbase.regionserver.StoreScanner)=0A=A0=A0=A0=A0=A0=A0= =A0 at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.= java:403)=0A=A0=A0=A0=A0=A0=A0=A0 - locked <0x00000000ccb54860> (a org.apac= he.hadoop.hbase.regionserver.StoreScanner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.a= pache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)=0A= =A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.HRegion$Regio= nScannerImpl.nextInternal(HRegion.java:3459)=0A=A0=A0=A0=A0=A0=A0=A0 at org= .apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.ja= va:3406)=0A=A0=A0=A0=A0=A0=A0=A0 - locked <0x00000000c59ee610> (a org.apach= e.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)=0A=A0=A0=A0=A0=A0=A0= =A0 at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(= HRegion.java:3423)=0A=A0=A0=A0=A0=A0=A0=A0 - locked <0x00000000c59ee610> (a= org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)=0A=A0=A0= =A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion= .java:4171)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver= .HRegion.get(HRegion.java:4144)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoo= p.hbase.regionserver.HRegionServer.get(HRegionServer.java:1958)=0A=A0=A0=A0= =A0=A0=A0=A0 at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source= )=0A=A0=A0=A0=A0=A0=A0=A0 at sun.reflect.DelegatingMethodAccessorImpl.invok= e(DelegatingMethodAccessorImpl.java:43)=0A=A0=A0=A0=A0=A0=A0=A0 at java.lan= g.reflect.Method.invoke(Method.java:601)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apa= che.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:3= 64)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.ipc.HBaseServer$Hand= ler.run(HBaseServer.java:1389)=0A=0A=0A=0AAt the same time I find that flus= h cannot finish:=0A=0A"regionserver60020.cacheFlusher" daemon prio=3D10 tid= =3D0x00007f05749ab000 nid=3D0x71fe waiting for monitor entry [0x00007f05677= f6000]=0A=A0=A0 java.lang.Thread.State: BLOCKED (on object monitor)=0A=A0= =A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.StoreScanner.upd= ateReaders(StoreScanner.java:443)=0A=A0=A0=A0=A0=A0=A0=A0 - waiting to lock= <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)= =0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.Store.noti= fyChangedReadersObservers(Store.java:904)=0A=A0=A0=A0=A0=A0=A0=A0 at org.ap= ache.hadoop.hbase.regionserver.Store.updateStorefiles(Store.java:893)=0A=A0= =A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.Store.access$600= (Store.java:107)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regions= erver.Store$StoreFlusherImpl.commit(Store.java:2291)=0A=A0=A0=A0=A0=A0=A0= =A0 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HReg= ion.java:1455)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionser= ver.HRegion.internalFlushcache(HRegion.java:1353)=0A=A0=A0=A0=A0=A0=A0=A0 a= t org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1294= )=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.MemStoreF= lusher.flushRegion(MemStoreFlusher.java:406)=0A=A0=A0=A0=A0=A0=A0=A0 at org= .apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlush= er.java:380)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserve= r.MemStoreFlusher.run(MemStoreFlusher.java:243)=0A=A0=A0=A0=A0=A0=A0=A0 at = java.lang.Thread.run(Thread.java:722)=0A=0A=0ABoth StoreScanner.updateReade= rs and StoreScanner.reseek are synchronized.=0A=0A=0ASo the problem seems t= o be that MemStoreScanner loops forever in getNext(...). I took a jstack a = bunch of times during execution, this always shows up.=0ANeed to dig a bit = more, I do not see a good way to deal with this, yet.=0A=0A-- Lars=0A=0A=0A= ----- Original Message -----=0AFrom: "Espinoza,Carlos" = =0ATo: user@hbase.apache.org=0ACc: =0ASent: Friday, August 10, 2012 3:12 PM= =0ASubject: RE: Extremely long flush times=0A=0AWe were able to replicate t= his behavior in a pseudo-distributed hbase=0A(hbase-0.94.1) environment. We= wrote a test program that creates a test=0Atable "MyTestTable" and populat= es it with random rows, then it creates a=0Arow with 60,000 columns and rep= eatedly updates it. Each column has a 18=0Abyte qualifier and a 50 byte val= ue. In our tests, when we ran the=0Aprogram, we usually never got beyond 15= updates before it would flush=0Afor a really long time. The rows that are = being updated are about 4MB=0Aeach (minues any hbase metadata).=0A=0AIt doe= sn't seem like it's caused by GC. I turned on gc logging, and=0Adidn't see = any long pauses. This is the gc log during the flush.=0Ahttp://pastebin.com= /vJKKXDx5=0A=0AThis is the regionserver log with debug on during the same f= lush=0Ahttp://pastebin.com/Fh5213mg=0A=0AThis is the test program we wrote.= =0Ahttp://pastebin.com/aZ0k5tx2=0A=0AYou should be able to just compile it= , and run it against a running=0AHBase cluster.=0A$ java TestTable=0A=0ACar= los=0A=0A-----Original Message-----=0AFrom: jdcryans@gmail.com [mailto:jdcr= yans@gmail.com] On Behalf Of=0AJean-Daniel Cryans=0ASent: Monday, August 06= , 2012 8:04 PM=0ATo: user@hbase.apache.org=0ASubject: Re: Extremely long fl= ush times=0A=0AHi Ron,=0A=0AI think this is caused by GC, for example there= 's 7 minutes between the=0Alast blocking message and the end of the flush:= =0A=0A2012-08-06 11:44:36,348 INFO=0Aorg.apache.hadoop.hbase.regionserver.H= Region: Blocking updates for 'IPC=0AServer handler 42 on 9009' on region=0A= Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6.:=0Amemst= ore size 271.9m is >=3D than blocking 256.0m size=0A2012-08-06 11:51:28,139= INFO=0Aorg.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flu= sh of=0A~155.4m/162905016, currentsize=3D116.5m/122172880 for region=0AWork= s,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in=0A501337m= s, sequenceid=3D1792264092, compaction requested=3Dtrue=0A=0ADuring that ti= me I would expect that clients got socket timeouts, tried=0Ato write again,= and data would just pile up in the RPC queue.=0A=0ADo you have the gc log = handy? Also did you set the zookeeper session=0Atimeout higher than the def= ault of 3 minutes?=0A=0AThx,=0A=0AJ-D=0A=0AOn Mon, Aug 6, 2012 at 2:37 PM, = Buckley,Ron wrote:=0A> We ran into this situation today= on our production instance, I'm =0A> wondering if anyone has seen or knows= a way around it.=0A>=0A>=0A>=0A> We were repeatedly updating the same 60,0= 00 column wide row.=A0 With =0A> each update, we were updating nearly every= column of the row. Each =0A> update generated nearly 9 MB in the WAL.=0A>= =0A> This worked OK most of the time. We saw frequent messages like this in= =0A=0A> the region server log, indicating a 2.6 second flush time:=0A>=0A> = 2012-08-06 11:42:52,316 INFO=0A> org.apache.hadoop.hbase.regionserver.HRegi= on: Finished memstore flush =0A> of ~155.4m/162931400, currentsize=3D2.7k/2= 760 for region =0A> Works,73658752:eng,1337915462169.1cfc92481a74ec27833787= 1563d189d6. in =0A> 2610ms, sequenceid=3D1792263953, compaction requested= =3Dfalse=0A>=0A>=0A>=0A> However, sometimes we would see a flush times arou= nd 8 - 10 Minutes.=0A>=0A> 2012-08-06 11:51:28,139 INFO=0A> org.apache.hado= op.hbase.regionserver.HRegion: Finished memstore flush =0A> of ~155.4m/1629= 05016, currentsize=3D116.5m/122172880 for region =0A> Works,73658752:eng,13= 37915462169.1cfc92481a74ec278337871563d189d6. in =0A> 501337ms, sequenceid= =3D1792264092, compaction requested=3Dtrue=0A>=0A>=0A>=0A> During these lon= g flushes, we would end up taking all the ipc handler =0A> slots into that = region server that we had configured (50).=A0 It looked =0A> to our applica= tion like the region server was unavailable.=0A>=0A>=0A>=0A> After the long= delay, the region server would continue running fine, =0A> but then do it = again a few minutes later.=0A>=0A>=0A>=0A> I thought it was most likely som= ething with Garbage Collection, so I =0A> got a stack trace during one of t= he events.=A0 The trace doesn't look =0A> like GC was running.=0A>=0A>=0A>= =0A>=0A>=0A> http://pastebin.com/ANBiUzX4=A0 - Stack Trace when the region = server was=0A=0A> unreachable=0A>=0A> http://pastebin.com/naTYQxZR=A0 - Reg= ion Server Log during the same =0A> interval=0A>=0A>=0A>=0A>=0A>=0A> Versio= ns:=0A>=0A> HBase 0.92.1, r1298924=0A>=0A> Hadoop 0.20.2-cdh3u3=0A>=0A>=0A>= =0A>=0A>=0A> --------------------------------------------------------------= --------=0A>=0A> Ron Buckley=0A>=0A> x6365=0A>=0A>=0A>