Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
  b=Yfzs7cv3jYGvdyezdJnH93Hi0VFk7uyPt7OgzlxYw3Ez4tPXXp3YNvFA6u9RZLUM/u2qCy6p6ucnAMLMGkg62w6AhvBDCRfFCPcoric2v8J5Zk+C1a6BgvgJugHmKWwOliDFyAlwgbcWiPjwslRFjx3xmZqDb4HPGLoDx/gstWY=;
References: 
 <CAAMYKhqSPcP0wOiG8RWZw1stTeHt-iULSuazkNmx+uLLVwPyxQ@mail.gmail.com>
	<CALte62zn50G=YC6BedMA4wZ8fHm4m4xD-Q10vHezkw+kja36PA@mail.gmail.com>
	<CAAMYKhodVTd25gW+m8MHmS1H22dgOF-2MfRBBZ8Q-1C9t=xzRQ@mail.gmail.com>
	<CADcMMgHxCor5156f7fMqx7cxS=+BfC9x=tCarWsXck+q7XTBRA@mail.gmail.com>
	<DC5EBE7F3610EB4CA5C7E92D76873E1518629B58B5@exchange2007.carrieriq.com>
	<CA+RK=_Cj4gc60DAcpF8+qS5yqF7SmLo_HwLbj-miRh8D7S-3YA@mail.gmail.com>
	<CA+RK=_A8i91becrEW7Pv8V476+_TLXKN5p+yUTTKVsfSNhGO_Q@mail.gmail.com>
	<CAAg3a2qVOppckqhD9c+2p_5zMtQcHyQK7H5HAfagMfmWJa-0Uw@mail.gmail.com>
	<1389849467.20702.YahooMailNeo@web140603.mail.bf1.yahoo.com>
	<CAAg3a2qmQ2BOGk9sjwefP24HMiQL0H6WPQnw8zNaT=XrpBG2HQ@mail.gmail.com>
	<DA8340397F7BAE41B8102757834CEB124F61C48C@ex-mbox1.xiaomi.net>
 <CAAg3a2pA7zwOoay3LDjtnQN4ktz7DE6aKG3DD+26JgPA-AN2WQ@mail.gmail.com>
Message-ID: <1389856402.45157.YahooMailNeo@web140602.mail.bf1.yahoo.com>
Date: Wed, 15 Jan 2014 23:13:22 -0800 (PST)
From: lars hofhansl <larsh@apache.org>
Reply-To: lars hofhansl <larsh@apache.org>
Subject: 
 =?utf-8?B?UmU6IOetlOWkjTogSEJhc2UgMC45NC4xNTogd3JpdGVzIHN0YWxscyBwZXJp?=
 =?utf-8?B?b2RpY2FsbHkgZXZlbiB1bmRlciBtb2RlcmF0ZSBzdGVhZHkgbG9hZCAoQVdT?=
 =?utf-8?B?IEVDMik=?=
To: Vladimir Rodionov <vladrodionov@gmail.com>,
  "dev@hbase.apache.org" <dev@hbase.apache.org>
In-Reply-To: 
 <CAAg3a2pA7zwOoay3LDjtnQN4ktz7DE6aKG3DD+26JgPA-AN2WQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

You should also set dfs.replication to 2 in hbase-site.xml. The replication=
 factor can be overridden per file and HBase does this in some cases (as in=
 the case of the HLog).=0A=0A=0AI do not think this is the issue here, thou=
gh.=0A=0ASo is DN 10.38.106.234:50010 bad or not? Looks like you have an HD=
FS problem. Which is likely network related (though you say it's not). What=
 does Hadoop's FSCK say? This seems to indicate that the HDFS NameNode thin=
ks that there only two healthy DataNodes.=0A=0AIf HBase cannot write its da=
ta to the selected file system (HDFS normally) it naturally can't do anythi=
ng but wait. It's actually quite cool that it recovered.=0A=0A=0A-- Lars=0A=
=0A________________________________=0AFrom: Vladimir Rodionov <vladrodionov=
@gmail.com>=0ATo: "dev@hbase.apache.org" <dev@hbase.apache.org> =0ACc: lars=
 hofhansl <larsh@apache.org> =0ASent: Wednesday, January 15, 2014 10:45 PM=
=0ASubject: Re: =E7=AD=94=E5=A4=8D: HBase 0.94.15: writes stalls periodical=
ly even under moderate steady load (AWS EC2)=0A=0A=0A=0AThis what I found i=
n a RS Log:=0A2014-01-16 01:22:18,256 ResponseProcessor for block blk_56193=
07008368309102_2603 WARN=C2=A0 [DFSClient] DFSOutputStream ResponseProcesso=
r exception=C2=A0 for block blk_5619307008368309102_2603java.io.IOException=
: Bad response 1 for block blk_5619307008368309102_2603 from datanode 10.38=
.106.234:50010=0A=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.h=
adoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2=
977)=0A=0A2014-01-16 01:22:18,258 DataStreamer for file /hbase/.logs/ip-10-=
10-25-199.ec2.internal,60020,1389843986689/ip-10-10-25-199.ec2.internal%2C6=
0020%2C1389843986689.1389853200626 WARN=C2=A0 [DFSClient] Error Recovery fo=
r block blk_5619307008368309102_2603 bad datanode[2] 10.38.106.234:50010=0A=
2014-01-16 01:22:18,258 DataStreamer for file /hbase/.logs/ip-10-10-25-199.=
ec2.internal,60020,1389843986689/ip-10-10-25-199.ec2.internal%2C60020%2C138=
9843986689.1389853200626 WARN=C2=A0 [DFSClient] Error Recovery for block bl=
k_5619307008368309102_2603 in pipeline 10.10.25.199:50010, 10.40.249.135:50=
010, 10.38.106.234:50010: bad datanode 10.38.106.234:50010=0A2014-01-16 01:=
22:22,800 IPC Server handler 10 on 60020 WARN=C2=A0 [HLog] HDFS pipeline er=
ror detected. Found 2 replicas but expecting no less than 3 replicas.=C2=A0=
 Requesting close of hlog.=0A2014-01-16 01:22:22,806 IPC Server handler 2 o=
n 60020 WARN=C2=A0 [HLog] HDFS pipeline error detected. Found 2 replicas bu=
t expecting no less than 3 replicas.=C2=A0 Requesting close of hlog.=0A2014=
-01-16 01:22:22,808 IPC Server handler 28 on 60020 WARN=C2=A0 [HLog] HDFS p=
ipeline error detected. Found 2 replicas but expecting no less than 3 repli=
cas.=C2=A0 Requesting close of hlog.=0A2014-01-16 01:22:22,808 IPC Server h=
andler 13 on 60020 WARN=C2=A0 [HLog] HDFS pipeline error detected. Found 2 =
replicas but expecting no less than 3 replicas.=C2=A0 Requesting close of h=
log.=0A2014-01-16 01:22:22,808 IPC Server handler 27 on 60020 WARN=C2=A0 [H=
Log] HDFS pipeline error detected. Found 2 replicas but expecting no less t=
han 3 replicas.=C2=A0 Requesting close of hlog.=0A2014-01-16 01:22:22,811 I=
PC Server handler 22 on 60020 WARN=C2=A0 [HLog] Too many consecutive RollWr=
iter requests, it's a sign of the total number of live datanodes is lower t=
han the tolerable replicas.=0A2014-01-16 01:22:22,911 IPC Server handler 8 =
on 60020 INFO=C2=A0 [HLog] LowReplication-Roller was enabled.=0A2014-01-16 =
01:22:22,930 regionserver60020.cacheFlusher INFO=C2=A0 [HRegion] Finished m=
emstore flush of ~128.3m/134538640, currentsize=3D3.0m/3113200 for region u=
sertable,,1389844429593.d4843a72f02a7396244930162fbecd06. in 68096ms, seque=
nceid=3D108753, compaction requested=3Dfalse=0A2014-01-16 01:22:22,930 regi=
onserver60020.logRoller INFO=C2=A0 [FSUtils] FileSystem doesn't support get=
DefaultReplication=0A2014-01-16 01:22:22,930 regionserver60020.logRoller IN=
FO=C2=A0 [FSUtils] FileSystem doesn't support getDefaultBlockSize=0A2014-01=
-16 01:22:23,027 regionserver60020.logRoller INFO=C2=A0 [HLog] Roll /hbase/=
.logs/ip-10-10-25-199.ec2.internal,60020,1389843986689/ip-10-10-25-199.ec2.=
internal%2C60020%2C1389843986689.1389853200626, entries=3D1012, filesize=3D=
140440002.=C2=A0 for /hbase/.logs/ip-10-10-25-199.ec2.internal,60020,138984=
3986689/ip-10-10-25-199.ec2.internal%2C60020%2C1389843986689.1389853342930=
=0A2014-01-16 01:22:23,194 IPC Server handler 23 on 60020 WARN=C2=A0 [HBase=
Server] (responseTooSlow): {"processingtimems":68410,"call":"multi(org.apac=
he.hadoop.hbase.client.MultiAction@51ff528e), rpc version=3D1, client versi=
on=3D29, methodsFingerPrint=3D-540141542","client":"10.38.163.32:51727","st=
arttimems":1389853274560,"queuetimems":0,"class":"HRegionServer","responses=
ize":0,"method":"multi"}=0A2014-01-16 01:22:23,401 IPC Server handler 13 on=
 60020 WARN=C2=A0 [HBaseServer] (responseTooSlow): {"processingtimems":6881=
3,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@4e136610), rpc v=
ersion=3D1, client version=3D29, methodsFingerPrint=3D-540141542","client":=
"10.38.163.32:51727","starttimems":1389853274586,"queuetimems":0,"class":"H=
RegionServer","responsesize":0,"method":"multi"}=0A2014-01-16 01:22:23,609 =
IPC Server handler 1 on 60020 WARN=C2=A0 [HBaseServer] (responseTooSlow): {=
"processingtimems":69002,"call":"multi(org.apache.hadoop.hbase.client.Multi=
Action@51390a8), rpc version=3D1, client version=3D29, methodsFingerPrint=
=3D-540141542","client":"10.38.163.32:51727","starttimems":1389853274604,"q=
ueuetimems":1,"class":"HRegionServer","responsesize":0,"method":"multi"}=0A=
2014-01-16 01:22:23,629 IPC Server handler 20 on 60020 WARN=C2=A0 [HBaseSer=
ver] (responseTooSlow): {"processingtimems":68991,"call":"multi(org.apache.=
hadoop.hbase.client.MultiAction@5f125a0f), rpc version=3D1, client version=
=3D29, methodsFingerPrint=3D-540141542","client":"10.38.163.32:51727","star=
ttimems":1389853274635,"queuetimems":1,"class":"HRegionServer","responsesiz=
e":0,"method":"multi"}=0A2014-01-16 01:22:23,656 IPC Server handler 27 on 6=
0020 WARN=C2=A0 [HBaseServer] (responseTooSlow): {"processingtimems":68835,=
"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2dd6bf8c), rpc ver=
sion=3D1, client version=3D29, methodsFingerPrint=3D-540141542","client":"1=
0.38.163.32:51727","starttimems":1389853274818,"queuetimems":1,"class":"HRe=
gionServer","responsesize":0,"method":"multi"}=0A2014-01-16 01:22:23,657 IP=
C Server handler 19 on 60020 WARN=C2=A0 [HBaseServer] (responseTooSlow): {"=
processingtimems":68982,"call":"multi(org.apache.hadoop.hbase.client.MultiA=
ction@6db997d6), rpc version=3D1, client version=3D29, methodsFingerPrint=
=3D-540141542","client":"10.38.163.32:51727","starttimems":1389853274673,"q=
ueuetimems":1,"class":"HRegionServer","responsesize":0,"method":"multi"}=0A=
=0A=0AThere are 10 DNs and all of them are pretty much alive. Replication f=
actor is 2 (dfs.replication in hdfs-site.xml). =0A=0A=0A=0A=0A=0AOn Wed, Ja=
n 15, 2014 at 9:55 PM, =E8=B0=A2=E8=89=AF <xieliang@xiaomi.com> wrote:=0A=
=0AIt would be better if you could provide some thread dumps while the stal=
ls happened.=0A>=0A>Thanks,=0A>Liang=0A>___________________________________=
_____=0A>=E5=8F=91=E4=BB=B6=E4=BA=BA: Vladimir Rodionov [vladrodionov@gmail=
.com]=0A>=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B41=E6=9C=8816=E6=
=97=A5 13:49=0A>=E6=94=B6=E4=BB=B6=E4=BA=BA: dev@hbase.apache.org; lars hof=
hansl=0A>=E4=B8=BB=E9=A2=98: Re: HBase 0.94.15: writes stalls periodically =
even under moderate steady load (AWS EC2)=0A>=0A>=0A>Its not IO, CPU or Net=
work - its HBase. Stalls repeat periodically. Any=0A>particular message in =
a Log file I should look for?=0A>=0A>=0A>On Wed, Jan 15, 2014 at 9:17 PM, l=
ars hofhansl <larsh@apache.org> wrote:=0A>=0A>> So where's the bottleneck? =
You say it's not IO, not is it CPU, I presume.=0A>> Network? Are the writer=
s blocked because there are too many storefiles?=0A>> (in which case you ma=
xed out your storage IO)=0A>> Are you hotspotting a region server?=0A>>=0A>=
> From the stacktrace it looks like ycsb is doing single puts, each=0A>> in=
curring an RPC. You're testing AWS' network :)=0A>>=0A>>=0A>> I write 10-20=
k (small) rows per second in bulk on a single box for testing=0A>> all the =
time.=0A>> With 3-way replication a 5 nodes cluster is pretty puny. Each bo=
x will get=0A>> 60% of each write on average, just to state the obvious.=0A=
>>=0A>> As I said, if it's slow, I'd love to see where the bottleneck is, s=
o that=0A>> we can fix it, if it is something we can fix in HBase.=0A>>=0A>=
> -- Lars=0A>>=0A>>=0A>>=0A>> ________________________________=0A>> =C2=A0F=
rom: Vladimir Rodionov <vladrodionov@gmail.com>=0A>> To: "dev@hbase.apache.=
org" <dev@hbase.apache.org>=0A>> Sent: Wednesday, January 15, 2014 5:32 PM=
=0A>> Subject: Re: HBase 0.94.15: writes stalls periodically even under mod=
erate=0A>> steady load (AWS EC2)=0A>>=0A>>=0A>> Yes, I am using ephemeral (=
local) storage. I found that iostat is most of=0A>> the time idle on 3K loa=
d with periodic bursts up to 10% iowait. 3-4K is=0A>> probably the maximum =
this skinny cluster can sustain w/o additional=0A>> configuration tweaking.=
 I will try more powerful instances, of course, but=0A>> the beauty of m1.x=
large is 0.05 price on the spot market. 5 nodes cluster=0A>> (+1) is ~ $7 p=
er day. Good for experiments, but, definitely, not for real=0A>> testing.=
=0A>>=0A>> -Vladimir Rodionov=0A>>=0A>>=0A>>=0A>> On Wed, Jan 15, 2014 at 3=
:27 PM, Andrew Purtell <apurtell@apache.org>=0A>> wrote:=0A>>=0A>> > Also I=
 assume your HDFS is provisioned on locally attached disk, aka=0A>> > insta=
nce store, and not EBS?=0A>> >=0A>> >=0A>> > On Wed, Jan 15, 2014 at 3:26 P=
M, Andrew Purtell <apurtell@apache.org>=0A>> > wrote:=0A>> >=0A>> > > m1.xl=
arge is a poorly provisioned instance type, with low PPS at the=0A>> > > ne=
twork layer. Can you try a type advertised to have "high" I/O=0A>> > > perf=
ormance?=0A>> > >=0A>> > >=0A>> > > On Wed, Jan 15, 2014 at 12:33 PM, Vladi=
mir Rodionov <=0A>> > > vrodionov@carrieriq.com> wrote:=0A>> > >=0A>> > >> =
This is something which needs to be definitely solved/fixed/resolved=0A>> >=
 >>=0A>> > >> I am running YCSB benchmark on aws ec2 on a small HBase clust=
er=0A>> > >>=0A>> > >> 5 (m1.xlarge) as RS=0A>> > >> 1 (m1.xlarge) hbase-ma=
ster, zookeper=0A>> > >>=0A>> > >> Whirr 0.8.2 (with many hacks) is used to=
 provision HBase.=0A>> > >>=0A>> > >> I am running 1 ycsb client (100% inse=
rt ops) throttled at 5K ops:=0A>> > >>=0A>> > >> ./bin/ycsb load hbase -P w=
orkloads/load20m -p columnfamily=3Dfamily -s=0A>> > >> -threads 10 -target =
5000=0A>> > >>=0A>> > >> OUTPUT:=0A>> > >>=0A>> > >> 1120 sec: 5602339 oper=
ations; 4999.7 current ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D225.=
53]=0A>> > >> =C2=A01130 sec: 5652117 operations; 4969.35 current ops/sec; =
[INSERT=0A>> > >> AverageLatency(us)=3D203.31]=0A>> > >> =C2=A01140 sec: 56=
65210 operations; 1309.04 current ops/sec; [INSERT=0A>> > >> AverageLatency=
(us)=3D17.13]=0A>> > >> =C2=A01150 sec: 5665210 operations; 0 current ops/s=
ec;=0A>> > >> =C2=A01160 sec: 5665210 operations; 0 current ops/sec;=0A>> >=
 >> =C2=A01170 sec: 5665210 operations; 0 current ops/sec;=0A>> > >> =C2=A0=
1180 sec: 5665210 operations; 0 current ops/sec;=0A>> > >> =C2=A01190 sec: =
5665210 operations; 0 current ops/sec;=0A>> > >> 2014-01-15 15:19:34,139 Th=
read-2 WARN=0A>> > >> =C2=A0[HConnectionManager$HConnectionImplementation] =
Failed all from=0A>> > >>=0A>> >=0A>> region=3Dusertable,user6039,138981185=
2201.40518862106856d23b883e5d543d0b89.,=0A>> > >> hostname=3Dip-10-45-174-1=
20.ec2.internal, port=3D60020=0A>> > >> java.util.concurrent.ExecutionExcep=
tion:=0A>> > java.net.SocketTimeoutException:=0A>> > >> Call to ip-10-45-17=
4-120.ec2.internal/10.45.174.120:60020 failed on=0A>> > >> socket timeout e=
xception: java.net.SocketTimeoutException: 60000=0A>> millis=0A>> > >> time=
out while waiting for channel to be ready for read. ch :=0A>> > >> java.nio=
.channels.SocketChannel[connected local=3D/10.180.211.173:42466=0A>> > remo=
te=3Dip-10-45-174-120.ec2.internal/=0A>> > >> 10.45.174.120:60020]=0A>> > >=
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >> java.util.concurrent.FutureTask$=
Sync.innerGet(FutureTask.java:252)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=
 java.util.concurrent.FutureTask.get(FutureTask.java:111)=0A>> > >> =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbase.clien=
t.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnec=
tionManager.java:1708)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=
=0A>> >=0A>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionI=
mplementation.processBatch(HConnectionManager.java:1560)=0A>> > >> =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 at=0A>> > >> org.apache.hadoop.hbase.client.HTable.flu=
shCommits(HTable.java:994)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> or=
g.apache.hadoop.hbase.client.HTable.doPut(HTable.java:850)=0A>> > >> =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hbase.client.HTable.put(HTable.j=
ava:826)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.yahoo.ycsb.db.HBaseCl=
ient.update(HBaseClient.java:328)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at =
com.yahoo.ycsb.db.HBaseClient.insert(HBaseClient.java:357)=0A>> > >> =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148=
)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >> com.yahoo.ycsb.workloa=
ds.CoreWorkload.doInsert(CoreWorkload.java:461)=0A>> > >> =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 at com.yahoo.ycsb.ClientThread.run(Client.java:269)=0A>> > >> Ca=
used by: java.net.SocketTimeoutException: Call to=0A>> > >> ip-10-45-174-12=
0.ec2.internal/10.45.174.120:60020 failed on socket=0A>> > >> timeout excep=
tion: java.net.SocketTimeoutException: 60000 millis=0A>> timeout=0A>> > >> =
while waiting for channel to be ready for read. ch :=0A>> > >> java.nio.cha=
nnels.SocketChannel[connected local=3D/10.180.211.173:42466=0A>> > remote=
=3Dip-10-45-174-120.ec2.internal/=0A>> > >> 10.45.174.120:60020]=0A>> > >> =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbas=
e.ipc.HBaseClient.wrapException(HBaseClient.java:1043)=0A>> > >> =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 at=0A>> > >> org.apache.hadoop.hbase.ipc.HBaseClient.call=
(HBaseClient.java:1016)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=
=0A>> >=0A>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(W=
ritableRpcEngine.java:87)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.sun.=
proxy.$Proxy5.multi(Unknown Source)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 a=
t=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbase.client.HConnectionManager$H=
ConnectionImplementation$3$1.call(HConnectionManager.java:1537)=0A>> > >> =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbas=
e.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionM=
anager.java:1535)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=
=0A>> org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCa=
llable.java:229)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=
=0A>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen=
tation$3.call(HConnectionManager.java:1544)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 at=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbase.client.HConnectionM=
anager$HConnectionImplementation$3.call(HConnectionManager.java:1532)=0A>> =
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >> java.util.concurrent.FutureTa=
sk$Sync.innerRun(FutureTask.java:334)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)=0A>> > >> =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> java.util.concurrent.Thre=
adPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)=0A>> > >> =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> java.util.concurrent.ThreadPoolE=
xecutor$Worker.run(ThreadPoolExecutor.java:615)=0A>> > >> =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 at java.lang.Thread.run(Thread.java:701)=0A>> > >>=0A>> > >>=0A>=
> > >> SKIPPED A LOT=0A>> > >>=0A>> > >>=0A>> > >> =C2=A01200 sec: 5674180 =
operations; 896.82 current ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D=
7506.37]=0A>> > >> =C2=A01210 sec: 6022326 operations; 34811.12 current ops=
/sec; [INSERT=0A>> > >> AverageLatency(us)=3D1998.26]=0A>> > >> =C2=A01220 =
sec: 6102627 operations; 8018.07 current ops/sec; [INSERT=0A>> > >> Average=
Latency(us)=3D395.11]=0A>> > >> =C2=A01230 sec: 6152632 operations; 5000 cu=
rrent ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D182.53]=0A>> > >> =C2=
=A01240 sec: 6202641 operations; 4999.9 current ops/sec; [INSERT=0A>> > >> =
AverageLatency(us)=3D201.76]=0A>> > >> =C2=A01250 sec: 6252642 operations; =
4999.6 current ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D190.46]=0A>>=
 > >> =C2=A01260 sec: 6302653 operations; 5000.1 current ops/sec; [INSERT=
=0A>> > >> AverageLatency(us)=3D212.31]=0A>> > >> =C2=A01270 sec: 6352660 o=
perations; 5000.2 current ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D2=
17.77]=0A>> > >> =C2=A01280 sec: 6402731 operations; 5000.1 current ops/sec=
; [INSERT=0A>> > >> AverageLatency(us)=3D195.83]=0A>> > >> =C2=A01290 sec: =
6452740 operations; 4999.9 current ops/sec; [INSERT=0A>> > >> AverageLatenc=
y(us)=3D232.43]=0A>> > >> =C2=A01300 sec: 6502743 operations; 4999.8 curren=
t ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D290.52]=0A>> > >> =C2=A01=
310 sec: 6552755 operations; 5000.2 current ops/sec; [INSERT=0A>> > >> Aver=
ageLatency(us)=3D259.49]=0A>> > >>=0A>> > >>=0A>> > >> As you can see here =
there is ~ 60 sec total write stall on a cluster=0A>> > >> which I suppose =
100% correlates with compactions started (minor)=0A>> > >>=0A>> > >> MAX_FI=
LESIZE =3D 5GB=0A>> > >> ## Regions of 'usertable' - 50=0A>> > >>=0A>> > >>=
 I would appreciate any advices on how to get rid of these stalls. 5K=0A>> =
per=0A>> > >> sec is quite moderate load even for 5 lousy AWS servers. Or i=
t is not?=0A>> > >>=0A>> > >> Best regards,=0A>> > >> Vladimir Rodionov=0A>=
> > >> Principal Platform Engineer=0A>> > >> Carrier IQ, www.carrieriq.com=
=0A>> > >> e-mail: vrodionov@carrieriq.com=0A>> > >>=0A>> > >>=0A>> > >> Co=
nfidentiality Notice: =C2=A0The information contained in this message,=0A>>=
 > >> including any attachments hereto, may be confidential and is intended=
=0A>> > to be=0A>> > >> read only by the individual or entity to whom this =
message is=0A>> > addressed. If=0A>> > >> the reader of this message is not=
 the intended recipient or an agent=0A>> or=0A>> > >> designee of the inten=
ded recipient, please note that any review, use,=0A>> > >> disclosure or di=
stribution of this message or its attachments, in any=0A>> > form,=0A>> > >=
> is strictly prohibited. =C2=A0If you have received this message in error,=
=0A>> > please=0A>> > >> immediately notify the sender and/or Notifications=
@carrieriq.com and=0A>> > >> delete or destroy any copy of this message and=
 its attachments.=0A>> > >>=0A>> > >=0A>> > >=0A>> > >=0A>> > > --=0A>> > >=
 Best regards,=0A>> > >=0A>> > > =C2=A0 =C2=A0- Andy=0A>> > >=0A>> > > Prob=
lems worthy of attack prove their worth by hitting back. - Piet=0A>> Hein=
=0A>> > > (via Tom White)=0A>> > >=0A>> >=0A>> >=0A>> >=0A>> > --=0A>> > Be=
st regards,=0A>> >=0A>> > =C2=A0 =C2=A0- Andy=0A>> >=0A>> > Problems worthy=
 of attack prove their worth by hitting back. - Piet Hein=0A>> > (via Tom W=
hite)=0A>> >=0A>>=0A>