Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EA2E3D279 for ; Thu, 16 Aug 2012 19:14:31 +0000 (UTC) Received: (qmail 23570 invoked by uid 500); 16 Aug 2012 19:14:31 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 23513 invoked by uid 500); 16 Aug 2012 19:14:31 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 23505 invoked by uid 99); 16 Aug 2012 19:14:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Aug 2012 19:14:31 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of espinozca@oclc.org designates 132.174.29.209 as permitted sender) Received: from [132.174.29.209] (HELO mshieldserver1.oclc.org) (132.174.29.209) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 16 Aug 2012 19:14:24 +0000 Received: From OAEXCH4SERVER.oa.oclc.org ([132.174.29.227]) by mshieldserver1.oclc.org (WebShield SMTP v4.5 MR3) id 1345144416232; Thu, 16 Aug 2012 15:13:36 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: Extremely long flush times Date: Thu, 16 Aug 2012 15:12:59 -0400 Message-ID: <6548F17059905B48B2A6F28CE3692BAA0CE296AB@OAEXCH4SERVER.oa.oclc.org> In-Reply-To: <1345142395.6174.YahooMailNeo@web121702.mail.ne1.yahoo.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Extremely long flush times Thread-Index: Ac173plvGSdJhAaTQwun1IsXPLpyhgAA8t3g References: <52E301F960B30049ADEFBCCF1CCAEF59111989E3@OAEXCH4SERVER.oa.oclc.org> <6548F17059905B48B2A6F28CE3692BAA0CE29691@OAEXCH4SERVER.oa.oclc.org> <1344667393.56061.YahooMailNeo@web121706.mail.ne1.yahoo.com> <1344670001.62389.YahooMailNeo@web121704.mail.ne1.yahoo.com> <1344670928.71313.YahooMailNeo@web121701.mail.ne1.yahoo.com> <1344807676.13385.YahooMailNeo@web121703.mail.ne1.yahoo.com> <1345142395.6174.YahooMailNeo@web121702.mail.ne1.yahoo.com> From: "Espinoza,Carlos" To: , "lars hofhansl" , Lars, Glad I could help. It was cool to see how you approached the problem and = came to a solution. Thanks for being so quick addressing this! Carlos -----Original Message----- From: lars hofhansl [mailto:lhofhansl@yahoo.com]=20 Sent: Thursday, August 16, 2012 2:40 PM To: user@hbase.apache.org; dev@hbase.apache.org; lars hofhansl Subject: Re: Extremely long flush times This is now committed to 0.94 (i.e. will be in 0.94.2) and 0.96. The fix = turned out to be pretty simple (but in an intricate part of HBase) Thanks for program demonstrating the problem Carlos, that was extremely = helpful! -- Lars ________________________________ From: lars hofhansl To: "user@hbase.apache.org" ; = "dev@hbase.apache.org" ; lars hofhansl = Sent: Sunday, August 12, 2012 2:41 PM Subject: Re: Extremely long flush times =20 I filed HBASE-6561 for this (Jira is back). ----- Original Message ----- From: lars hofhansl To: "dev@hbase.apache.org" ; = "user@hbase.apache.org" Cc:=20 Sent: Saturday, August 11, 2012 12:42 AM Subject: Re: Extremely long flush times A possible solution is to have the MemStoreScanner reseek eagerly (i.e. = just iterate forward) for a bit (say 100 KVs or so).If that is not = fruitful then issue the expensive reseek. I'll try that tomorrow. (In this case the tailset created from the reseek often 300.000 or more = entries in it. That is not necessarily a problem since it is not = recreated.) -- Lars ----- Original Message ----- From: lars hofhansl To: "user@hbase.apache.org" ; hbase-dev = Cc:=20 Sent: Saturday, August 11, 2012 12:26 AM Subject: Re: Extremely long flush times It turns out the problem is not the loop in MemStoreScanner, but = excessive calls to StoreScanner.reseek. As a test I changed ScanWildcardColumnTracker.checkVersion to = MatchCode.SKIP instead of MatchCode.SEEK_NEXT_COL (when the max number = of versions is reached); and I do not see this behavior (I see the loop = that would not go past 15 or so, happily go on until I stop the client). Not sure what the conclusion would be. Seeking the memstore seems to be = expensive, so it should only be done when many KV can be skipped with a = seek, otherwise we should just iterate. It is not clear how to find this out ahead of time. I'm open to suggestions. -- Lars ----- Original Message ----- From: lars hofhansl To: "user@hbase.apache.org" ; hbase-dev = Cc:=20 Sent: Friday, August 10, 2012 11:43 PM Subject: Re: Extremely long flush times Ran your test code (thanks Carlos). Found two things: 1. Store.internalFlushCache(...) should be calling = StoreScanner.next(List, int limit) - currently it does not set = a limit.(But this is not the problem). 2. With jstack I found that the code is stuck in a loop in = Memstore.MemstoreScanner.getNext(...) Here's the relevant part of the jstack: "IPC Server handler 6 on 60020" daemon prio=3D10 = tid=3D0x00007f0574625000 nid=3D0x720c runnable [0x00007f05669e7000] =A0=A0 java.lang.Thread.State: RUNNABLE =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(Mem= Store.java:726) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seekInSubLi= sts(MemStore.java:761) =A0=A0=A0=A0=A0=A0=A0 - locked <0x00000000c4a8a860> (a = org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.reseek(MemS= tore.java:800) =A0=A0=A0=A0=A0=A0=A0 - locked <0x00000000c4a8a860> (a = org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(No= nLazyKeyValueScanner.java:54) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValu= eHeap.java:299) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.jav= a:244) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.jav= a:522) =A0=A0=A0=A0=A0=A0=A0 - eliminated <0x00000000ccb54860> (a = org.apache.hadoop.hbase.regionserver.StoreScanner) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:= 403) =A0=A0=A0=A0=A0=A0=A0 - locked <0x00000000ccb54860> (a = org.apache.hadoop.hbase.regionserver.StoreScanner) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:= 127) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextIntern= al(HRegion.java:3459) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegi= on.java:3406) =A0=A0=A0=A0=A0=A0=A0 - locked <0x00000000c59ee610> (a = org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegi= on.java:3423) =A0=A0=A0=A0=A0=A0=A0 - locked <0x00000000c59ee610> (a = org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4144) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java= :1958) =A0=A0=A0=A0=A0=A0=A0 at = sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) =A0=A0=A0=A0=A0=A0=A0 at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:43) =A0=A0=A0=A0=A0=A0=A0 at = java.lang.reflect.Method.invoke(Method.java:601) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngi= ne.java:364) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1389= ) At the same time I find that flush cannot finish: "regionserver60020.cacheFlusher" daemon prio=3D10 = tid=3D0x00007f05749ab000 nid=3D0x71fe waiting for monitor entry = [0x00007f05677f6000] =A0=A0 java.lang.Thread.State: BLOCKED (on object monitor) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScan= ner.java:443) =A0=A0=A0=A0=A0=A0=A0 - waiting to lock <0x00000000ccb54860> (a = org.apache.hadoop.hbase.regionserver.StoreScanner) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.Store.notifyChangedReadersObservers(= Store.java:904) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.Store.updateStorefiles(Store.java:89= 3) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.Store.access$600(Store.java:107) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.commit(Store.= java:2291) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.j= ava:1455) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.j= ava:1353) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1294= ) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStore= Flusher.java:406) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStore= Flusher.java:380) =A0=A0=A0=A0=A0=A0=A0 at = org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.= java:243) =A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run(Thread.java:722) Both StoreScanner.updateReaders and StoreScanner.reseek are = synchronized. So the problem seems to be that MemStoreScanner loops forever in = getNext(...). I took a jstack a bunch of times during execution, this = always shows up. Need to dig a bit more, I do not see a good way to deal with this, yet. -- Lars ----- Original Message ----- From: "Espinoza,Carlos" To: user@hbase.apache.org Cc:=20 Sent: Friday, August 10, 2012 3:12 PM Subject: RE: Extremely long flush times We were able to replicate this behavior in a pseudo-distributed hbase (hbase-0.94.1) environment. We wrote a test program that creates a test = table "MyTestTable" and populates it with random rows, then it creates a = row with 60,000 columns and repeatedly updates it. Each column has a 18 = byte qualifier and a 50 byte value. In our tests, when we ran the = program, we usually never got beyond 15 updates before it would flush = for a really long time. The rows that are being updated are about 4MB = each (minues any hbase metadata). It doesn't seem like it's caused by GC. I turned on gc logging, and = didn't see any long pauses. This is the gc log during the flush. http://pastebin.com/vJKKXDx5 This is the regionserver log with debug on during the same flush = http://pastebin.com/Fh5213mg This is the test program we wrote.=20 http://pastebin.com/aZ0k5tx2 You should be able to just compile it, and run it against a running = HBase cluster. $ java TestTable Carlos -----Original Message----- From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of = Jean-Daniel Cryans Sent: Monday, August 06, 2012 8:04 PM To: user@hbase.apache.org Subject: Re: Extremely long flush times Hi Ron, I think this is caused by GC, for example there's 7 minutes between the = last blocking message and the end of the flush: 2012-08-06 11:44:36,348 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC = Server handler 42 on 9009' on region Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6.: memstore size 271.9m is >=3D than blocking 256.0m size 2012-08-06 11:51:28,139 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of = ~155.4m/162905016, currentsize=3D116.5m/122172880 for region = Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in = 501337ms, sequenceid=3D1792264092, compaction requested=3Dtrue During that time I would expect that clients got socket timeouts, tried = to write again, and data would just pile up in the RPC queue. Do you have the gc log handy? Also did you set the zookeeper session = timeout higher than the default of 3 minutes? Thx, J-D On Mon, Aug 6, 2012 at 2:37 PM, Buckley,Ron wrote: > We ran into this situation today on our production instance, I'm=20 > wondering if anyone has seen or knows a way around it. > > > > We were repeatedly updating the same 60,000 column wide row.=A0 With=20 > each update, we were updating nearly every column of the row. Each=20 > update generated nearly 9 MB in the WAL. > > This worked OK most of the time. We saw frequent messages like this in > the region server log, indicating a 2.6 second flush time: > > 2012-08-06 11:42:52,316 INFO > org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush=20 > of ~155.4m/162931400, currentsize=3D2.7k/2760 for region=20 > Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in=20 > 2610ms, sequenceid=3D1792263953, compaction requested=3Dfalse > > > > However, sometimes we would see a flush times around 8 - 10 Minutes. > > 2012-08-06 11:51:28,139 INFO > org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush=20 > of ~155.4m/162905016, currentsize=3D116.5m/122172880 for region=20 > Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in=20 > 501337ms, sequenceid=3D1792264092, compaction requested=3Dtrue > > > > During these long flushes, we would end up taking all the ipc handler=20 > slots into that region server that we had configured (50).=A0 It = looked=20 > to our application like the region server was unavailable. > > > > After the long delay, the region server would continue running fine,=20 > but then do it again a few minutes later. > > > > I thought it was most likely something with Garbage Collection, so I=20 > got a stack trace during one of the events.=A0 The trace doesn't look=20 > like GC was running. > > > > > > http://pastebin.com/ANBiUzX4=A0 - Stack Trace when the region server = was > unreachable > > http://pastebin.com/naTYQxZR=A0 - Region Server Log during the same=20 > interval > > > > > > Versions: > > HBase 0.92.1, r1298924 > > Hadoop 0.20.2-cdh3u3 > > > > > > ---------------------------------------------------------------------- > > Ron Buckley > > x6365 > > >