Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 92B5ED6FA for ; Sun, 12 Aug 2012 21:41:46 +0000 (UTC) Received: (qmail 2333 invoked by uid 500); 12 Aug 2012 21:41:44 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 2279 invoked by uid 500); 12 Aug 2012 21:41:44 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 2271 invoked by uid 99); 12 Aug 2012 21:41:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Aug 2012 21:41:44 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_FREEMAIL_1,FSL_FREEMAIL_2,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.138.229.162] (HELO nm39-vm2.bullet.mail.ne1.yahoo.com) (98.138.229.162) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Aug 2012 21:41:38 +0000 Received: from [98.138.90.49] by nm39.bullet.mail.ne1.yahoo.com with NNFMP; 12 Aug 2012 21:41:17 -0000 Received: from [98.138.89.162] by tm2.bullet.mail.ne1.yahoo.com with NNFMP; 12 Aug 2012 21:41:17 -0000 Received: from [127.0.0.1] by omp1018.mail.ne1.yahoo.com with NNFMP; 12 Aug 2012 21:41:16 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 990285.54868.bm@omp1018.mail.ne1.yahoo.com Received: (qmail 13402 invoked by uid 60001); 12 Aug 2012 21:41:16 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1344807676; bh=Cxyv2PO5x/b0oBO2UX18FU7fl+/61VlRwDeOWvWtUCw=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=w3OhUzvbrGq7zD0GkhsTRx+4EkIis6bxoq85zJ4n3sD0SxJ7ABKnXQq/s8SIsKLU+iC6eO97KkczFerRSGeGXa6hX8EVXNeQPT88q9D59vnI11wqLxuNteqKi5wEmRZra1JTKrJKmjZWsVo4FSRb/uj2uH35m7xsEZRmqI7iRJQ= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=e3R0OCV+uuoe8/dRo7W93xBoMPdcn1um9Kjfr/Hg1iSAY0QkHBjGG6KG3NzCNNSTt465hGYghmMRx/5+o9nCaTEMkj+A8nLW2tgbB/E+fVC1byRsYX0oaTWGieWfYIjbu3kSgQ4lYELhzOafqYoSR57EOgpOt94VwU/W5UDY3aU=; X-YMail-OSG: AEXkvC0VM1kIHHkDveWvYpmUeSIfEgp2GwMCKsfbly531aw idRlaLMhTubV.H0j_VCnljZFJz8DHPoljdc.DRkSLH6.ea32NA9YSIF0_71t 1b9HmndAFN9iL8UYor39QkDlMO.0laozne5NJyzAJTJw1oKvHvoO8HZ_aTcD 9vKsE50kDqNT69T9b5IGZS954tF4P017WiuZ3VOyDeSIGHJJufYP46Fc320a yZwdUeyzh9ePLpQscD0ZXnfCNS_L_uejkCP_Bf9Thjo0jSDzxY_EDZtsTN4z UJG7uX0iPvAJgoglAGyx3l5CfFR8HyztTr_412n1sTYGJY8uutQK6SGySCA1 dOilXnyKcB2gpdUSvFbt2D3icuHMYmpFx5Mi3xEeJTqhNu.B8hReU.1K1TMv ceLIq1avj7j72hPwqKum3T1a34oWSP1WUsbMGJyTvJZpzn3hUZKUbNwFqJcg EgYK4OfCIoXFtGTGMaphpvzP4qci69mfJ03kq_CsWgEiyQBIiTbCBwwgoYj3 LZMJ8qDS_ynwADGIiU6QH95usuOtzZsx9WRUYjFLKbjwHQcPb Received: from [24.4.207.85] by web121703.mail.ne1.yahoo.com via HTTP; Sun, 12 Aug 2012 14:41:16 PDT X-Mailer: YahooMailWebService/0.8.120.356233 References: <52E301F960B30049ADEFBCCF1CCAEF59111989E3@OAEXCH4SERVER.oa.oclc.org> <6548F17059905B48B2A6F28CE3692BAA0CE29691@OAEXCH4SERVER.oa.oclc.org> <1344667393.56061.YahooMailNeo@web121706.mail.ne1.yahoo.com> <1344670001.62389.YahooMailNeo@web121704.mail.ne1.yahoo.com> <1344670928.71313.YahooMailNeo@web121701.mail.ne1.yahoo.com> Message-ID: <1344807676.13385.YahooMailNeo@web121703.mail.ne1.yahoo.com> Date: Sun, 12 Aug 2012 14:41:16 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Extremely long flush times To: "user@hbase.apache.org" , "dev@hbase.apache.org" , lars hofhansl In-Reply-To: <1344670928.71313.YahooMailNeo@web121701.mail.ne1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I filed HBASE-6561 for this (Jira is back).=0A=0A=0A=0A----- Original Messa= ge -----=0AFrom: lars hofhansl =0ATo: "dev@hbase.apach= e.org" ; "user@hbase.apache.org" =0ACc: =0ASent: Saturday, August 11, 2012 12:42 AM=0ASubject: Re: Extrem= ely long flush times=0A=0AA possible solution is to have the MemStoreScanne= r reseek eagerly (i.e. just iterate forward) for a bit (say 100 KVs or so).= If that is not fruitful then issue the expensive reseek. I'll try that tomo= rrow.=0A=0A=0A=0A(In this case the tailset created from the reseek often 30= 0.000 or more entries in it. That is not necessarily a problem since it is = not recreated.)=0A=0A-- Lars=0A=0A=0A----- Original Message -----=0AFrom: l= ars hofhansl =0ATo: "user@hbase.apache.org" ; hbase-dev =0ACc: =0ASent: Saturday, Au= gust 11, 2012 12:26 AM=0ASubject: Re: Extremely long flush times=0A=0AIt tu= rns out the problem is not the loop in MemStoreScanner, but excessive calls= to StoreScanner.reseek.=0A=0AAs a test I changed ScanWildcardColumnTracker= .checkVersion to MatchCode.SKIP instead of MatchCode.SEEK_NEXT_COL (when th= e max number of versions is reached); and I do not see this behavior (I see= the loop that would not go past 15 or so, happily go on until I stop the c= lient).=0A=0ANot sure what the conclusion would be. Seeking the memstore se= ems to be expensive, so it should only be done when many KV can be skipped = with a seek, otherwise we should just iterate.=0AIt is not clear how to fin= d this out ahead of time.=0A=0AI'm open to suggestions.=0A=0A-- Lars=0A=0A= =0A=0A----- Original Message -----=0AFrom: lars hofhansl =0ATo: "user@hbase.apache.org" ; hbase-dev =0ACc: =0ASent: Friday, August 10, 2012 11:43 PM=0ASubject:= Re: Extremely long flush times=0A=0ARan your test code (thanks Carlos).=0A= =0AFound two things:=0A1. Store.internalFlushCache(...) should be calling S= toreScanner.next(List, int limit) - currently it does not set a l= imit.(But this is not the problem).=0A=0A2. With jstack I found that the co= de is stuck in a loop in Memstore.MemstoreScanner.getNext(...)=0A=0AHere's = the relevant part of the jstack:=0A"IPC Server handler 6 on 60020" daemon p= rio=3D10 tid=3D0x00007f0574625000 nid=3D0x720c runnable [0x00007f05669e7000= ]=0A=A0=A0 java.lang.Thread.State: RUNNABLE=0A=A0=A0=A0=A0=A0=A0=A0 at org.= apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.= java:726)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.M= emStore$MemStoreScanner.seekInSubLists(MemStore.java:761)=0A=A0=A0=A0=A0=A0= =A0=A0 - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase.regionserve= r.MemStore$MemStoreScanner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hb= ase.regionserver.MemStore$MemStoreScanner.reseek(MemStore.java:800)=0A=A0= =A0=A0=A0=A0=A0=A0 - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase= .regionserver.MemStore$MemStoreScanner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apac= he.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyVa= lueScanner.java:54)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regi= onserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299)=0A=A0=A0=A0=A0= =A0=A0=A0 at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyVa= lueHeap.java:244)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.region= server.StoreScanner.reseek(StoreScanner.java:522)=0A=A0=A0=A0=A0=A0=A0=A0 -= eliminated <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.St= oreScanner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver= .StoreScanner.next(StoreScanner.java:403)=0A=A0=A0=A0=A0=A0=A0=A0 - locked = <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)= =0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.KeyValueHe= ap.next(KeyValueHeap.java:127)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop= .hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:345= 9)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.HRegion$= RegionScannerImpl.next(HRegion.java:3406)=0A=A0=A0=A0=A0=A0=A0=A0 - locked = <0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HRegion$Region= ScannerImpl)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserve= r.HRegion$RegionScannerImpl.next(HRegion.java:3423)=0A=A0=A0=A0=A0=A0=A0=A0= - locked <0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HReg= ion$RegionScannerImpl)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.r= egionserver.HRegion.get(HRegion.java:4171)=0A=A0=A0=A0=A0=A0=A0=A0 at org.a= pache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4144)=0A=A0=A0=A0= =A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HReg= ionServer.java:1958)=0A=A0=A0=A0=A0=A0=A0=A0 at sun.reflect.GeneratedMethod= Accessor12.invoke(Unknown Source)=0A=A0=A0=A0=A0=A0=A0=A0 at sun.reflect.De= legatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)=0A= =A0=A0=A0=A0=A0=A0=A0 at java.lang.reflect.Method.invoke(Method.java:601)= =0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$S= erver.call(WritableRpcEngine.java:364)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apach= e.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1389)=0A=0A=0A= =0AAt the same time I find that flush cannot finish:=0A=0A"regionserver6002= 0.cacheFlusher" daemon prio=3D10 tid=3D0x00007f05749ab000 nid=3D0x71fe wait= ing for monitor entry [0x00007f05677f6000]=0A=A0=A0 java.lang.Thread.State:= BLOCKED (on object monitor)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.h= base.regionserver.StoreScanner.updateReaders(StoreScanner.java:443)=0A=A0= =A0=A0=A0=A0=A0=A0 - waiting to lock <0x00000000ccb54860> (a org.apache.had= oop.hbase.regionserver.StoreScanner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.= hadoop.hbase.regionserver.Store.notifyChangedReadersObservers(Store.java:90= 4)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.Store.up= dateStorefiles(Store.java:893)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop= .hbase.regionserver.Store.access$600(Store.java:107)=0A=A0=A0=A0=A0=A0=A0= =A0 at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.commit(S= tore.java:2291)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionse= rver.HRegion.internalFlushcache(HRegion.java:1455)=0A=A0=A0=A0=A0=A0=A0=A0 = at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.= java:1353)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.= HRegion.flushcache(HRegion.java:1294)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache= .hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java= :406)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.MemSt= oreFlusher.flushRegion(MemStoreFlusher.java:380)=0A=A0=A0=A0=A0=A0=A0=A0 at= org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.j= ava:243)=0A=A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run(Thread.java:722)= =0A=0A=0ABoth StoreScanner.updateReaders and StoreScanner.reseek are synchr= onized.=0A=0A=0ASo the problem seems to be that MemStoreScanner loops forev= er in getNext(...). I took a jstack a bunch of times during execution, this= always shows up.=0ANeed to dig a bit more, I do not see a good way to deal= with this, yet.=0A=0A-- Lars=0A=0A=0A----- Original Message -----=0AFrom: = "Espinoza,Carlos" =0ATo: user@hbase.apache.org=0ACc: = =0ASent: Friday, August 10, 2012 3:12 PM=0ASubject: RE: Extremely long flus= h times=0A=0AWe were able to replicate this behavior in a pseudo-distribute= d hbase=0A(hbase-0.94.1) environment. We wrote a test program that creates = a test=0Atable "MyTestTable" and populates it with random rows, then it cre= ates a=0Arow with 60,000 columns and repeatedly updates it. Each column has= a 18=0Abyte qualifier and a 50 byte value. In our tests, when we ran the= =0Aprogram, we usually never got beyond 15 updates before it would flush=0A= for a really long time. The rows that are being updated are about 4MB=0Aeac= h (minues any hbase metadata).=0A=0AIt doesn't seem like it's caused by GC.= I turned on gc logging, and=0Adidn't see any long pauses. This is the gc l= og during the flush.=0Ahttp://pastebin.com/vJKKXDx5=0A=0AThis is the region= server log with debug on during the same flush=0Ahttp://pastebin.com/Fh5213= mg=0A=0AThis is the test program we wrote. =0Ahttp://pastebin.com/aZ0k5tx2= =0A=0AYou should be able to just compile it, and run it against a running= =0AHBase cluster.=0A$ java TestTable=0A=0ACarlos=0A=0A-----Original Message= -----=0AFrom: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of= =0AJean-Daniel Cryans=0ASent: Monday, August 06, 2012 8:04 PM=0ATo: user@hb= ase.apache.org=0ASubject: Re: Extremely long flush times=0A=0AHi Ron,=0A=0A= I think this is caused by GC, for example there's 7 minutes between the=0Al= ast blocking message and the end of the flush:=0A=0A2012-08-06 11:44:36,348= INFO=0Aorg.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for = 'IPC=0AServer handler 42 on 9009' on region=0AWorks,73658752:eng,1337915462= 169.1cfc92481a74ec278337871563d189d6.:=0Amemstore size 271.9m is >=3D than = blocking 256.0m size=0A2012-08-06 11:51:28,139 INFO=0Aorg.apache.hadoop.hba= se.regionserver.HRegion: Finished memstore flush of=0A~155.4m/162905016, cu= rrentsize=3D116.5m/122172880 for region=0AWorks,73658752:eng,1337915462169.= 1cfc92481a74ec278337871563d189d6. in=0A501337ms, sequenceid=3D1792264092, c= ompaction requested=3Dtrue=0A=0ADuring that time I would expect that client= s got socket timeouts, tried=0Ato write again, and data would just pile up = in the RPC queue.=0A=0ADo you have the gc log handy? Also did you set the z= ookeeper session=0Atimeout higher than the default of 3 minutes?=0A=0AThx,= =0A=0AJ-D=0A=0AOn Mon, Aug 6, 2012 at 2:37 PM, Buckley,Ron wrote:=0A> We ran into this situation today on our production instance,= I'm =0A> wondering if anyone has seen or knows a way around it.=0A>=0A>=0A= >=0A> We were repeatedly updating the same 60,000 column wide row.=A0 With = =0A> each update, we were updating nearly every column of the row. Each =0A= > update generated nearly 9 MB in the WAL.=0A>=0A> This worked OK most of t= he time. We saw frequent messages like this in=0A=0A> the region server log= , indicating a 2.6 second flush time:=0A>=0A> 2012-08-06 11:42:52,316 INFO= =0A> org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush = =0A> of ~155.4m/162931400, currentsize=3D2.7k/2760 for region =0A> Works,73= 658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in =0A> 2610ms, = sequenceid=3D1792263953, compaction requested=3Dfalse=0A>=0A>=0A>=0A> Howev= er, sometimes we would see a flush times around 8 - 10 Minutes.=0A>=0A> 201= 2-08-06 11:51:28,139 INFO=0A> org.apache.hadoop.hbase.regionserver.HRegion:= Finished memstore flush =0A> of ~155.4m/162905016, currentsize=3D116.5m/12= 2172880 for region =0A> Works,73658752:eng,1337915462169.1cfc92481a74ec2783= 37871563d189d6. in =0A> 501337ms, sequenceid=3D1792264092, compaction reque= sted=3Dtrue=0A>=0A>=0A>=0A> During these long flushes, we would end up taki= ng all the ipc handler =0A> slots into that region server that we had confi= gured (50).=A0 It looked =0A> to our application like the region server was= unavailable.=0A>=0A>=0A>=0A> After the long delay, the region server would= continue running fine, =0A> but then do it again a few minutes later.=0A>= =0A>=0A>=0A> I thought it was most likely something with Garbage Collection= , so I =0A> got a stack trace during one of the events.=A0 The trace doesn'= t look =0A> like GC was running.=0A>=0A>=0A>=0A>=0A>=0A> http://pastebin.co= m/ANBiUzX4=A0 - Stack Trace when the region server was=0A=0A> unreachable= =0A>=0A> http://pastebin.com/naTYQxZR=A0 - Region Server Log during the sam= e =0A> interval=0A>=0A>=0A>=0A>=0A>=0A> Versions:=0A>=0A> HBase 0.92.1, r12= 98924=0A>=0A> Hadoop 0.20.2-cdh3u3=0A>=0A>=0A>=0A>=0A>=0A> ----------------= ------------------------------------------------------=0A>=0A> Ron Buckley= =0A>=0A> x6365=0A>=0A>=0A>=0A