Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
  b=e3R0OCV+uuoe8/dRo7W93xBoMPdcn1um9Kjfr/Hg1iSAY0QkHBjGG6KG3NzCNNSTt465hGYghmMRx/5+o9nCaTEMkj+A8nLW2tgbB/E+fVC1byRsYX0oaTWGieWfYIjbu3kSgQ4lYELhzOafqYoSR57EOgpOt94VwU/W5UDY3aU=;
References: 
 <52E301F960B30049ADEFBCCF1CCAEF59111989E3@OAEXCH4SERVER.oa.oclc.org>
 <CAGpTDNeVU33LAchszMU_iE6SExX1eLT286kD3rudEQTwXCDVuA@mail.gmail.com>
 <6548F17059905B48B2A6F28CE3692BAA0CE29691@OAEXCH4SERVER.oa.oclc.org>
 <1344667393.56061.YahooMailNeo@web121706.mail.ne1.yahoo.com>
 <1344670001.62389.YahooMailNeo@web121704.mail.ne1.yahoo.com>
 <1344670928.71313.YahooMailNeo@web121701.mail.ne1.yahoo.com>
Message-ID: <1344807676.13385.YahooMailNeo@web121703.mail.ne1.yahoo.com>
Date: Sun, 12 Aug 2012 14:41:16 -0700 (PDT)
From: lars hofhansl <lhofhansl@yahoo.com>
Reply-To: lars hofhansl <lhofhansl@yahoo.com>
Subject: Re: Extremely long flush times
To: "user@hbase.apache.org" <user@hbase.apache.org>,
  "dev@hbase.apache.org" <dev@hbase.apache.org>,
  lars hofhansl <lhofhansl@yahoo.com>
In-Reply-To: <1344670928.71313.YahooMailNeo@web121701.mail.ne1.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

I filed HBASE-6561 for this (Jira is back).=0A=0A=0A=0A----- Original Messa=
ge -----=0AFrom: lars hofhansl <lhofhansl@yahoo.com>=0ATo: "dev@hbase.apach=
e.org" <dev@hbase.apache.org>; "user@hbase.apache.org" <user@hbase.apache.o=
rg>=0ACc: =0ASent: Saturday, August 11, 2012 12:42 AM=0ASubject: Re: Extrem=
ely long flush times=0A=0AA possible solution is to have the MemStoreScanne=
r reseek eagerly (i.e. just iterate forward) for a bit (say 100 KVs or so).=
If that is not fruitful then issue the expensive reseek. I'll try that tomo=
rrow.=0A=0A=0A=0A(In this case the tailset created from the reseek often 30=
0.000 or more entries in it. That is not necessarily a problem since it is =
not recreated.)=0A=0A-- Lars=0A=0A=0A----- Original Message -----=0AFrom: l=
ars hofhansl <lhofhansl@yahoo.com>=0ATo: "user@hbase.apache.org" <user@hbas=
e.apache.org>; hbase-dev <dev@hbase.apache.org>=0ACc: =0ASent: Saturday, Au=
gust 11, 2012 12:26 AM=0ASubject: Re: Extremely long flush times=0A=0AIt tu=
rns out the problem is not the loop in MemStoreScanner, but excessive calls=
 to StoreScanner.reseek.=0A=0AAs a test I changed ScanWildcardColumnTracker=
.checkVersion to MatchCode.SKIP instead of MatchCode.SEEK_NEXT_COL (when th=
e max number of versions is reached); and I do not see this behavior (I see=
 the loop that would not go past 15 or so, happily go on until I stop the c=
lient).=0A=0ANot sure what the conclusion would be. Seeking the memstore se=
ems to be expensive, so it should only be done when many KV can be skipped =
with a seek, otherwise we should just iterate.=0AIt is not clear how to fin=
d this out ahead of time.=0A=0AI'm open to suggestions.=0A=0A-- Lars=0A=0A=
=0A=0A----- Original Message -----=0AFrom: lars hofhansl <lhofhansl@yahoo.c=
om>=0ATo: "user@hbase.apache.org" <user@hbase.apache.org>; hbase-dev <dev@h=
base.apache.org>=0ACc: =0ASent: Friday, August 10, 2012 11:43 PM=0ASubject:=
 Re: Extremely long flush times=0A=0ARan your test code (thanks Carlos).=0A=
=0AFound two things:=0A1. Store.internalFlushCache(...) should be calling S=
toreScanner.next(List<KeyValue>, int limit) - currently it does not set a l=
imit.(But this is not the problem).=0A=0A2. With jstack I found that the co=
de is stuck in a loop in Memstore.MemstoreScanner.getNext(...)=0A=0AHere's =
the relevant part of the jstack:=0A"IPC Server handler 6 on 60020" daemon p=
rio=3D10 tid=3D0x00007f0574625000 nid=3D0x720c runnable [0x00007f05669e7000=
]=0A=A0=A0 java.lang.Thread.State: RUNNABLE=0A=A0=A0=A0=A0=A0=A0=A0 at org.=
apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.=
java:726)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.M=
emStore$MemStoreScanner.seekInSubLists(MemStore.java:761)=0A=A0=A0=A0=A0=A0=
=A0=A0 - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase.regionserve=
r.MemStore$MemStoreScanner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hb=
ase.regionserver.MemStore$MemStoreScanner.reseek(MemStore.java:800)=0A=A0=
=A0=A0=A0=A0=A0=A0 - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase=
.regionserver.MemStore$MemStoreScanner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apac=
he.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyVa=
lueScanner.java:54)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regi=
onserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299)=0A=A0=A0=A0=A0=
=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyVa=
lueHeap.java:244)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.region=
server.StoreScanner.reseek(StoreScanner.java:522)=0A=A0=A0=A0=A0=A0=A0=A0 -=
 eliminated <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.St=
oreScanner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver=
.StoreScanner.next(StoreScanner.java:403)=0A=A0=A0=A0=A0=A0=A0=A0 - locked =
<0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)=
=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.KeyValueHe=
ap.next(KeyValueHeap.java:127)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop=
.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:345=
9)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.HRegion$=
RegionScannerImpl.next(HRegion.java:3406)=0A=A0=A0=A0=A0=A0=A0=A0 - locked =
<0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HRegion$Region=
ScannerImpl)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserve=
r.HRegion$RegionScannerImpl.next(HRegion.java:3423)=0A=A0=A0=A0=A0=A0=A0=A0=
 - locked <0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HReg=
ion$RegionScannerImpl)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.r=
egionserver.HRegion.get(HRegion.java:4171)=0A=A0=A0=A0=A0=A0=A0=A0 at org.a=
pache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4144)=0A=A0=A0=A0=
=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HReg=
ionServer.java:1958)=0A=A0=A0=A0=A0=A0=A0=A0 at sun.reflect.GeneratedMethod=
Accessor12.invoke(Unknown Source)=0A=A0=A0=A0=A0=A0=A0=A0 at sun.reflect.De=
legatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)=0A=
=A0=A0=A0=A0=A0=A0=A0 at java.lang.reflect.Method.invoke(Method.java:601)=
=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$S=
erver.call(WritableRpcEngine.java:364)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apach=
e.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1389)=0A=0A=0A=
=0AAt the same time I find that flush cannot finish:=0A=0A"regionserver6002=
0.cacheFlusher" daemon prio=3D10 tid=3D0x00007f05749ab000 nid=3D0x71fe wait=
ing for monitor entry [0x00007f05677f6000]=0A=A0=A0 java.lang.Thread.State:=
 BLOCKED (on object monitor)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.h=
base.regionserver.StoreScanner.updateReaders(StoreScanner.java:443)=0A=A0=
=A0=A0=A0=A0=A0=A0 - waiting to lock <0x00000000ccb54860> (a org.apache.had=
oop.hbase.regionserver.StoreScanner)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.=
hadoop.hbase.regionserver.Store.notifyChangedReadersObservers(Store.java:90=
4)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.Store.up=
dateStorefiles(Store.java:893)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop=
.hbase.regionserver.Store.access$600(Store.java:107)=0A=A0=A0=A0=A0=A0=A0=
=A0 at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.commit(S=
tore.java:2291)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionse=
rver.HRegion.internalFlushcache(HRegion.java:1455)=0A=A0=A0=A0=A0=A0=A0=A0 =
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.=
java:1353)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.=
HRegion.flushcache(HRegion.java:1294)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache=
.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java=
:406)=0A=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hbase.regionserver.MemSt=
oreFlusher.flushRegion(MemStoreFlusher.java:380)=0A=A0=A0=A0=A0=A0=A0=A0 at=
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.j=
ava:243)=0A=A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run(Thread.java:722)=
=0A=0A=0ABoth StoreScanner.updateReaders and StoreScanner.reseek are synchr=
onized.=0A=0A=0ASo the problem seems to be that MemStoreScanner loops forev=
er in getNext(...). I took a jstack a bunch of times during execution, this=
 always shows up.=0ANeed to dig a bit more, I do not see a good way to deal=
 with this, yet.=0A=0A-- Lars=0A=0A=0A----- Original Message -----=0AFrom: =
"Espinoza,Carlos" <espinozca@oclc.org>=0ATo: user@hbase.apache.org=0ACc: =
=0ASent: Friday, August 10, 2012 3:12 PM=0ASubject: RE: Extremely long flus=
h times=0A=0AWe were able to replicate this behavior in a pseudo-distribute=
d hbase=0A(hbase-0.94.1) environment. We wrote a test program that creates =
a test=0Atable "MyTestTable" and populates it with random rows, then it cre=
ates a=0Arow with 60,000 columns and repeatedly updates it. Each column has=
 a 18=0Abyte qualifier and a 50 byte value. In our tests, when we ran the=
=0Aprogram, we usually never got beyond 15 updates before it would flush=0A=
for a really long time. The rows that are being updated are about 4MB=0Aeac=
h (minues any hbase metadata).=0A=0AIt doesn't seem like it's caused by GC.=
 I turned on gc logging, and=0Adidn't see any long pauses. This is the gc l=
og during the flush.=0Ahttp://pastebin.com/vJKKXDx5=0A=0AThis is the region=
server log with debug on during the same flush=0Ahttp://pastebin.com/Fh5213=
mg=0A=0AThis is the test program we wrote. =0Ahttp://pastebin.com/aZ0k5tx2=
=0A=0AYou should be able to just compile it, and run it against a running=
=0AHBase cluster.=0A$ java TestTable=0A=0ACarlos=0A=0A-----Original Message=
-----=0AFrom: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of=
=0AJean-Daniel Cryans=0ASent: Monday, August 06, 2012 8:04 PM=0ATo: user@hb=
ase.apache.org=0ASubject: Re: Extremely long flush times=0A=0AHi Ron,=0A=0A=
I think this is caused by GC, for example there's 7 minutes between the=0Al=
ast blocking message and the end of the flush:=0A=0A2012-08-06 11:44:36,348=
 INFO=0Aorg.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for =
'IPC=0AServer handler 42 on 9009' on region=0AWorks,73658752:eng,1337915462=
169.1cfc92481a74ec278337871563d189d6.:=0Amemstore size 271.9m is >=3D than =
blocking 256.0m size=0A2012-08-06 11:51:28,139 INFO=0Aorg.apache.hadoop.hba=
se.regionserver.HRegion: Finished memstore flush of=0A~155.4m/162905016, cu=
rrentsize=3D116.5m/122172880 for region=0AWorks,73658752:eng,1337915462169.=
1cfc92481a74ec278337871563d189d6. in=0A501337ms, sequenceid=3D1792264092, c=
ompaction requested=3Dtrue=0A=0ADuring that time I would expect that client=
s got socket timeouts, tried=0Ato write again, and data would just pile up =
in the RPC queue.=0A=0ADo you have the gc log handy? Also did you set the z=
ookeeper session=0Atimeout higher than the default of 3 minutes?=0A=0AThx,=
=0A=0AJ-D=0A=0AOn Mon, Aug 6, 2012 at 2:37 PM, Buckley,Ron <buckleyr@oclc.o=
rg> wrote:=0A> We ran into this situation today on our production instance,=
 I'm =0A> wondering if anyone has seen or knows a way around it.=0A>=0A>=0A=
>=0A> We were repeatedly updating the same 60,000 column wide row.=A0 With =
=0A> each update, we were updating nearly every column of the row. Each =0A=
> update generated nearly 9 MB in the WAL.=0A>=0A> This worked OK most of t=
he time. We saw frequent messages like this in=0A=0A> the region server log=
, indicating a 2.6 second flush time:=0A>=0A> 2012-08-06 11:42:52,316 INFO=
=0A> org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush =
=0A> of ~155.4m/162931400, currentsize=3D2.7k/2760 for region =0A> Works,73=
658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in =0A> 2610ms, =
sequenceid=3D1792263953, compaction requested=3Dfalse=0A>=0A>=0A>=0A> Howev=
er, sometimes we would see a flush times around 8 - 10 Minutes.=0A>=0A> 201=
2-08-06 11:51:28,139 INFO=0A> org.apache.hadoop.hbase.regionserver.HRegion:=
 Finished memstore flush =0A> of ~155.4m/162905016, currentsize=3D116.5m/12=
2172880 for region =0A> Works,73658752:eng,1337915462169.1cfc92481a74ec2783=
37871563d189d6. in =0A> 501337ms, sequenceid=3D1792264092, compaction reque=
sted=3Dtrue=0A>=0A>=0A>=0A> During these long flushes, we would end up taki=
ng all the ipc handler =0A> slots into that region server that we had confi=
gured (50).=A0 It looked =0A> to our application like the region server was=
 unavailable.=0A>=0A>=0A>=0A> After the long delay, the region server would=
 continue running fine, =0A> but then do it again a few minutes later.=0A>=
=0A>=0A>=0A> I thought it was most likely something with Garbage Collection=
, so I =0A> got a stack trace during one of the events.=A0 The trace doesn'=
t look =0A> like GC was running.=0A>=0A>=0A>=0A>=0A>=0A> http://pastebin.co=
m/ANBiUzX4=A0 - Stack Trace when the region server was=0A=0A> unreachable=
=0A>=0A> http://pastebin.com/naTYQxZR=A0 - Region Server Log during the sam=
e =0A> interval=0A>=0A>=0A>=0A>=0A>=0A> Versions:=0A>=0A> HBase 0.92.1, r12=
98924=0A>=0A> Hadoop 0.20.2-cdh3u3=0A>=0A>=0A>=0A>=0A>=0A> ----------------=
------------------------------------------------------=0A>=0A> Ron Buckley=
=0A>=0A> x6365=0A>=0A>=0A>=0A