Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
  b=gzoyc1C7cVVX9Q+NPr3lD6yU9QNDPDWgtPLD03FkCivpvApxCyqiGvgCOBWC4oEpZ8iUqK7DfKLkaP1niAbyN6QSGn3vcK6rIwWJJnjAG6TvIgL8fEVXBf1wBHz3UsGkDgzajjoED7iJnooaCbBzxVGt+ustKFkfnS0GgthS+pg=;
References: 
 <CAAMYKhqSPcP0wOiG8RWZw1stTeHt-iULSuazkNmx+uLLVwPyxQ@mail.gmail.com>
	<CALte62zn50G=YC6BedMA4wZ8fHm4m4xD-Q10vHezkw+kja36PA@mail.gmail.com>
	<CAAMYKhodVTd25gW+m8MHmS1H22dgOF-2MfRBBZ8Q-1C9t=xzRQ@mail.gmail.com>
	<CADcMMgHxCor5156f7fMqx7cxS=+BfC9x=tCarWsXck+q7XTBRA@mail.gmail.com>
	<DC5EBE7F3610EB4CA5C7E92D76873E1518629B58B5@exchange2007.carrieriq.com>
	<CA+RK=_Cj4gc60DAcpF8+qS5yqF7SmLo_HwLbj-miRh8D7S-3YA@mail.gmail.com>
	<CA+RK=_A8i91becrEW7Pv8V476+_TLXKN5p+yUTTKVsfSNhGO_Q@mail.gmail.com>
	<CAAg3a2qVOppckqhD9c+2p_5zMtQcHyQK7H5HAfagMfmWJa-0Uw@mail.gmail.com>
	<1389849467.20702.YahooMailNeo@web140603.mail.bf1.yahoo.com>
	<CAAg3a2qmQ2BOGk9sjwefP24HMiQL0H6WPQnw8zNaT=XrpBG2HQ@mail.gmail.com>
	<DA8340397F7BAE41B8102757834CEB124F61C48C@ex-mbox1.xiaomi.net>
	<CAAg3a2pA7zwOoay3LDjtnQN4ktz7DE6aKG3DD+26JgPA-AN2WQ@mail.gmail.com>
	<DA8340397F7BAE41B8102757834CEB124F61CA09@ex-mbox1.xiaomi.net>
 <CAAg3a2pWTHkAg9xHnxb4U4Hbog1soR_s2PC71b8cx5qW10A8yw@mail.gmail.com>
Message-ID: <1389977399.47061.YahooMailNeo@web140602.mail.bf1.yahoo.com>
Date: Fri, 17 Jan 2014 08:49:59 -0800 (PST)
From: lars hofhansl <larsh@apache.org>
Reply-To: lars hofhansl <larsh@apache.org>
Subject: 
 =?utf-8?B?UmU6IOetlOWkjTog562U5aSNOiBIQmFzZSAwLjk0LjE1OiB3cml0ZXMgc3Rh?=
 =?utf-8?B?bGxzIHBlcmlvZGljYWxseSBldmVuIHVuZGVyIG1vZGVyYXRlIHN0ZWFkeSBs?=
 =?utf-8?B?b2FkIChBV1MgRUMyKQ==?=
To: "dev@hbase.apache.org" <dev@hbase.apache.org>
In-Reply-To: 
 <CAAg3a2pWTHkAg9xHnxb4U4Hbog1soR_s2PC71b8cx5qW10A8yw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

One other question: Did you give the RS the default 1000mb heap?=0AFor real=
 work you have to increase that. Maybe try with 4000 or 8000 on those boxes=
.=0A=0A(In any case that will not solve any problem with unreachable or una=
vailable data nodes)=0A=0A=0A-- Lars=0A=0A=0A=0A----- Original Message ----=
-=0AFrom: Vladimir Rodionov <vladrodionov@gmail.com>=0ATo: "dev@hbase.apach=
e.org" <dev@hbase.apache.org>=0ACc: lars hofhansl <larsh@apache.org>=0ASent=
: Thursday, January 16, 2014 6:46 AM=0ASubject: Re: =E7=AD=94=E5=A4=8D: =E7=
=AD=94=E5=A4=8D: HBase 0.94.15: writes stalls periodically even under moder=
ate steady load (AWS EC2)=0A=0A1.0.4 (default).=0A=0A=0AOn Thu, Jan 16, 201=
4 at 12:24 AM, =E8=B0=A2=E8=89=AF <xieliang@xiaomi.com> wrote:=0A=0A> Just =
curious, what's your hadoop version, Vladimir ?=0A> At least on hadoop2.0+,=
 the default ReplcaceDatanode policy should be=0A> expected pick another dn=
 up to setupPipeline, then if you have only 1 dn=0A> broken, it should be e=
xpected still could write into 3 nodes successful,=0A> and then the HBase's=
 "hbase.regionserver.hlog.tolerable.lowreplication"=0A> checking will not j=
ump out:)=0A>=0A> Thanks,=0A> ________________________________________=0A> =
=E5=8F=91=E4=BB=B6=E4=BA=BA: Vladimir Rodionov [vladrodionov@gmail.com]=0A>=
 =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B41=E6=9C=8816=E6=97=A5 1=
4:45=0A> =E6=94=B6=E4=BB=B6=E4=BA=BA: dev@hbase.apache.org=0A> =E6=8A=84=E9=
=80=81: lars hofhansl=0A> =E4=B8=BB=E9=A2=98: Re: =E7=AD=94=E5=A4=8D: HBase=
 0.94.15: writes stalls periodically even under moderate=0A> steady load (A=
WS EC2)=0A>=0A> This what I found in a RS Log:=0A> 2014-01-16 01:22:18,256 =
ResponseProcessor for block=0A> blk_5619307008368309102_2603 WARN=C2=A0 [DF=
SClient] DFSOutputStream=0A> ResponseProcessor exception=C2=A0 for block=0A=
> blk_5619307008368309102_2603java.io.IOException: Bad response 1 for block=
=0A> blk_5619307008368309102_2603 from datanode 10.38.106.234:50010=0A>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0  at=0A>=0A> org.apache.hadoop.hdfs.DFSClient$DFSOu=
tputStream$ResponseProcessor.run(DFSClient.java:2977)=0A>=0A> 2014-01-16 01=
:22:18,258 DataStreamer for file=0A>=0A> /hbase/.logs/ip-10-10-25-199.ec2.i=
nternal,60020,1389843986689/ip-10-10-25-199.ec2.internal%2C60020%2C13898439=
86689.1389853200626=0A> WARN=C2=A0 [DFSClient] Error Recovery for block blk=
_5619307008368309102_2603 bad=0A> datanode[2] 10.38.106.234:50010=0A> 2014-=
01-16 01:22:18,258 DataStreamer for file=0A>=0A> /hbase/.logs/ip-10-10-25-1=
99.ec2.internal,60020,1389843986689/ip-10-10-25-199.ec2.internal%2C60020%2C=
1389843986689.1389853200626=0A> WARN=C2=A0 [DFSClient] Error Recovery for b=
lock blk_5619307008368309102_2603 in=0A> pipeline 10.10.25.199:50010, 10.40=
.249.135:50010, 10.38.106.234:50010: bad=0A> datanode 10.38.106.234:50010=
=0A> 2014-01-16 01:22:22,800 IPC Server handler 10 on 60020 WARN=C2=A0 [HLo=
g] HDFS=0A> pipeline error detected. Found 2 replicas but expecting no less=
 than 3=0A> replicas.=C2=A0 Requesting close of hlog.=0A> 2014-01-16 01:22:=
22,806 IPC Server handler 2 on 60020 WARN=C2=A0 [HLog] HDFS=0A> pipeline er=
ror detected. Found 2 replicas but expecting no less than 3=0A> replicas.=
=C2=A0 Requesting close of hlog.=0A> 2014-01-16 01:22:22,808 IPC Server han=
dler 28 on 60020 WARN=C2=A0 [HLog] HDFS=0A> pipeline error detected. Found =
2 replicas but expecting no less than 3=0A> replicas.=C2=A0 Requesting clos=
e of hlog.=0A> 2014-01-16 01:22:22,808 IPC Server handler 13 on 60020 WARN=
=C2=A0 [HLog] HDFS=0A> pipeline error detected. Found 2 replicas but expect=
ing no less than 3=0A> replicas.=C2=A0 Requesting close of hlog.=0A> 2014-0=
1-16 01:22:22,808 IPC Server handler 27 on 60020 WARN=C2=A0 [HLog] HDFS=0A>=
 pipeline error detected. Found 2 replicas but expecting no less than 3=0A>=
 replicas.=C2=A0 Requesting close of hlog.=0A> 2014-01-16 01:22:22,811 IPC =
Server handler 22 on 60020 WARN=C2=A0 [HLog] Too=0A> many consecutive RollW=
riter requests, it's a sign of the total number of=0A> live datanodes is lo=
wer than the tolerable replicas.=0A> 2014-01-16 01:22:22,911 IPC Server han=
dler 8 on 60020 INFO=C2=A0 [HLog]=0A> LowReplication-Roller was enabled.=0A=
> 2014-01-16 01:22:22,930 regionserver60020.cacheFlusher INFO=C2=A0 [HRegio=
n]=0A> Finished memstore flush of ~128.3m/134538640, currentsize=3D3.0m/311=
3200 for=0A> region usertable,,1389844429593.d4843a72f02a7396244930162fbecd=
06. in=0A> 68096ms, sequenceid=3D108753, compaction requested=3Dfalse=0A> 2=
014-01-16 01:22:22,930 regionserver60020.logRoller INFO=C2=A0 [FSUtils]=0A>=
 FileSystem doesn't support getDefaultReplication=0A> 2014-01-16 01:22:22,9=
30 regionserver60020.logRoller INFO=C2=A0 [FSUtils]=0A> FileSystem doesn't =
support getDefaultBlockSize=0A> 2014-01-16 01:22:23,027 regionserver60020.l=
ogRoller INFO=C2=A0 [HLog] Roll=0A>=0A> /hbase/.logs/ip-10-10-25-199.ec2.in=
ternal,60020,1389843986689/ip-10-10-25-199.ec2.internal%2C60020%2C138984398=
6689.1389853200626,=0A> entries=3D1012, filesize=3D140440002.=C2=A0 for=0A>=
=0A> /hbase/.logs/ip-10-10-25-199.ec2.internal,60020,1389843986689/ip-10-10=
-25-199.ec2.internal%2C60020%2C1389843986689.1389853342930=0A> 2014-01-16 0=
1:22:23,194 IPC Server handler 23 on 60020 WARN=C2=A0 [HBaseServer]=0A> (re=
sponseTooSlow):=0A>=0A> {"processingtimems":68410,"call":"multi(org.apache.=
hadoop.hbase.client.MultiAction@51ff528e=0A> ),=0A> rpc version=3D1, client=
 version=3D29, methodsFingerPrint=3D-540141542","client":"=0A> 10.38.163.32=
:51727=0A>=0A> ","starttimems":1389853274560,"queuetimems":0,"class":"HRegi=
onServer","responsesize":0,"method":"multi"}=0A> 2014-01-16 01:22:23,401 IP=
C Server handler 13 on 60020 WARN=C2=A0 [HBaseServer]=0A> (responseTooSlow)=
:=0A>=0A> {"processingtimems":68813,"call":"multi(org.apache.hadoop.hbase.c=
lient.MultiAction@4e136610=0A> ),=0A> rpc version=3D1, client version=3D29,=
 methodsFingerPrint=3D-540141542","client":"=0A> 10.38.163.32:51727=0A>=0A>=
 ","starttimems":1389853274586,"queuetimems":0,"class":"HRegionServer","res=
ponsesize":0,"method":"multi"}=0A> 2014-01-16 01:22:23,609 IPC Server handl=
er 1 on 60020 WARN=C2=A0 [HBaseServer]=0A> (responseTooSlow):=0A>=0A> {"pro=
cessingtimems":69002,"call":"multi(org.apache.hadoop.hbase.client.MultiActi=
on@51390a8=0A> ),=0A> rpc version=3D1, client version=3D29, methodsFingerPr=
int=3D-540141542","client":"=0A> 10.38.163.32:51727=0A>=0A> ","starttimems"=
:1389853274604,"queuetimems":1,"class":"HRegionServer","responsesize":0,"me=
thod":"multi"}=0A> 2014-01-16 01:22:23,629 IPC Server handler 20 on 60020 W=
ARN=C2=A0 [HBaseServer]=0A> (responseTooSlow):=0A>=0A> {"processingtimems":=
68991,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@5f125a0f=0A>=
 ),=0A> rpc version=3D1, client version=3D29, methodsFingerPrint=3D-5401415=
42","client":"=0A> 10.38.163.32:51727=0A>=0A> ","starttimems":1389853274635=
,"queuetimems":1,"class":"HRegionServer","responsesize":0,"method":"multi"}=
=0A> 2014-01-16 01:22:23,656 IPC Server handler 27 on 60020 WARN=C2=A0 [HBa=
seServer]=0A> (responseTooSlow):=0A>=0A> {"processingtimems":68835,"call":"=
multi(org.apache.hadoop.hbase.client.MultiAction@2dd6bf8c=0A> ),=0A> rpc ve=
rsion=3D1, client version=3D29, methodsFingerPrint=3D-540141542","client":"=
=0A> 10.38.163.32:51727=0A>=0A> ","starttimems":1389853274818,"queuetimems"=
:1,"class":"HRegionServer","responsesize":0,"method":"multi"}=0A> 2014-01-1=
6 01:22:23,657 IPC Server handler 19 on 60020 WARN=C2=A0 [HBaseServer]=0A> =
(responseTooSlow):=0A>=0A> {"processingtimems":68982,"call":"multi(org.apac=
he.hadoop.hbase.client.MultiAction@6db997d6=0A> ),=0A> rpc version=3D1, cli=
ent version=3D29, methodsFingerPrint=3D-540141542","client":"=0A> 10.38.163=
.32:51727=0A>=0A> ","starttimems":1389853274673,"queuetimems":1,"class":"HR=
egionServer","responsesize":0,"method":"multi"}=0A>=0A>=0A> There are 10 DN=
s and all of them are pretty much alive. Replication factor=0A> is 2 (dfs.r=
eplication in hdfs-site.xml).=0A>=0A>=0A>=0A> On Wed, Jan 15, 2014 at 9:55 =
PM, =E8=B0=A2=E8=89=AF <xieliang@xiaomi.com> wrote:=0A>=0A> > It would be b=
etter if you could provide some thread dumps while the=0A> stalls=0A> > hap=
pened.=0A> >=0A> > Thanks,=0A> > Liang=0A> > ______________________________=
__________=0A> > =E5=8F=91=E4=BB=B6=E4=BA=BA: Vladimir Rodionov [vladrodion=
ov@gmail.com]=0A> > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B41=E6=
=9C=8816=E6=97=A5 13:49=0A> > =E6=94=B6=E4=BB=B6=E4=BA=BA: dev@hbase.apache=
.org; lars hofhansl=0A> > =E4=B8=BB=E9=A2=98: Re: HBase 0.94.15: writes sta=
lls periodically even under moderate=0A> > steady load (AWS EC2)=0A> >=0A> =
> Its not IO, CPU or Network - its HBase. Stalls repeat periodically. Any=
=0A> > particular message in a Log file I should look for?=0A> >=0A> >=0A> =
> On Wed, Jan 15, 2014 at 9:17 PM, lars hofhansl <larsh@apache.org> wrote:=
=0A> >=0A> > > So where's the bottleneck? You say it's not IO, not is it CP=
U, I=0A> presume.=0A> > > Network? Are the writers blocked because there ar=
e too many storefiles?=0A> > > (in which case you maxed out your storage IO=
)=0A> > > Are you hotspotting a region server?=0A> > >=0A> > > From the sta=
cktrace it looks like ycsb is doing single puts, each=0A> > > incurring an =
RPC. You're testing AWS' network :)=0A> > >=0A> > >=0A> > > I write 10-20k =
(small) rows per second in bulk on a single box for=0A> > testing=0A> > > a=
ll the time.=0A> > > With 3-way replication a 5 nodes cluster is pretty pun=
y. Each box will=0A> > get=0A> > > 60% of each write on average, just to st=
ate the obvious.=0A> > >=0A> > > As I said, if it's slow, I'd love to see w=
here the bottleneck is, so=0A> that=0A> > > we can fix it, if it is somethi=
ng we can fix in HBase.=0A> > >=0A> > > -- Lars=0A> > >=0A> > >=0A> > >=0A>=
 > > ________________________________=0A> > >=C2=A0 From: Vladimir Rodionov=
 <vladrodionov@gmail.com>=0A> > > To: "dev@hbase.apache.org" <dev@hbase.apa=
che.org>=0A> > > Sent: Wednesday, January 15, 2014 5:32 PM=0A> > > Subject:=
 Re: HBase 0.94.15: writes stalls periodically even under=0A> > moderate=0A=
> > > steady load (AWS EC2)=0A> > >=0A> > >=0A> > > Yes, I am using ephemer=
al (local) storage. I found that iostat is most=0A> of=0A> > > the time idl=
e on 3K load with periodic bursts up to 10% iowait. 3-4K is=0A> > > probabl=
y the maximum this skinny cluster can sustain w/o additional=0A> > > config=
uration tweaking. I will try more powerful instances, of course,=0A> > but=
=0A> > > the beauty of m1.xlarge is 0.05 price on the spot market. 5 nodes=
=0A> cluster=0A> > > (+1) is ~ $7 per day. Good for experiments, but, defin=
itely, not for=0A> real=0A> > > testing.=0A> > >=0A> > > -Vladimir Rodionov=
=0A> > >=0A> > >=0A> > >=0A> > > On Wed, Jan 15, 2014 at 3:27 PM, Andrew Pu=
rtell <apurtell@apache.org>=0A> > > wrote:=0A> > >=0A> > > > Also I assume =
your HDFS is provisioned on locally attached disk, aka=0A> > > > instance s=
tore, and not EBS?=0A> > > >=0A> > > >=0A> > > > On Wed, Jan 15, 2014 at 3:=
26 PM, Andrew Purtell <apurtell@apache.org=0A> >=0A> > > > wrote:=0A> > > >=
=0A> > > > > m1.xlarge is a poorly provisioned instance type, with low PPS =
at=0A> the=0A> > > > > network layer. Can you try a type advertised to have=
 "high" I/O=0A> > > > > performance?=0A> > > > >=0A> > > > >=0A> > > > > On=
 Wed, Jan 15, 2014 at 12:33 PM, Vladimir Rodionov <=0A> > > > > vrodionov@c=
arrieriq.com> wrote:=0A> > > > >=0A> > > > >> This is something which needs=
 to be definitely=0A> solved/fixed/resolved=0A> > > > >>=0A> > > > >> I am =
running YCSB benchmark on aws ec2 on a small HBase cluster=0A> > > > >>=0A>=
 > > > >> 5 (m1.xlarge) as RS=0A> > > > >> 1 (m1.xlarge) hbase-master, zook=
eper=0A> > > > >>=0A> > > > >> Whirr 0.8.2 (with many hacks) is used to pro=
vision HBase.=0A> > > > >>=0A> > > > >> I am running 1 ycsb client (100% in=
sert ops) throttled at 5K ops:=0A> > > > >>=0A> > > > >> ./bin/ycsb load hb=
ase -P workloads/load20m -p columnfamily=3Dfamily=0A> -s=0A> > > > >> -thre=
ads 10 -target 5000=0A> > > > >>=0A> > > > >> OUTPUT:=0A> > > > >>=0A> > > =
> >> 1120 sec: 5602339 operations; 4999.7 current ops/sec; [INSERT=0A> > > =
> >> AverageLatency(us)=3D225.53]=0A> > > > >>=C2=A0 1130 sec: 5652117 oper=
ations; 4969.35 current ops/sec; [INSERT=0A> > > > >> AverageLatency(us)=3D=
203.31]=0A> > > > >>=C2=A0 1140 sec: 5665210 operations; 1309.04 current op=
s/sec; [INSERT=0A> > > > >> AverageLatency(us)=3D17.13]=0A> > > > >>=C2=A0 =
1150 sec: 5665210 operations; 0 current ops/sec;=0A> > > > >>=C2=A0 1160 se=
c: 5665210 operations; 0 current ops/sec;=0A> > > > >>=C2=A0 1170 sec: 5665=
210 operations; 0 current ops/sec;=0A> > > > >>=C2=A0 1180 sec: 5665210 ope=
rations; 0 current ops/sec;=0A> > > > >>=C2=A0 1190 sec: 5665210 operations=
; 0 current ops/sec;=0A> > > > >> 2014-01-15 15:19:34,139 Thread-2 WARN=0A>=
 > > > >>=C2=A0 [HConnectionManager$HConnectionImplementation] Failed all f=
rom=0A> > > > >>=0A> > > >=0A> > >=0A> >=0A> region=3Dusertable,user6039,13=
89811852201.40518862106856d23b883e5d543d0b89.,=0A> > > > >> hostname=3Dip-1=
0-45-174-120.ec2.internal, port=3D60020=0A> > > > >> java.util.concurrent.E=
xecutionException:=0A> > > > java.net.SocketTimeoutException:=0A> > > > >> =
Call to ip-10-45-174-120.ec2.internal/10.45.174.120:60020 failed=0A> on=0A>=
 > > > >> socket timeout exception: java.net.SocketTimeoutException: 60000=
=0A> > > millis=0A> > > > >> timeout while waiting for channel to be ready =
for read. ch :=0A> > > > >> java.nio.channels.SocketChannel[connected local=
=3D/=0A> > 10.180.211.173:42466=0A> > > > remote=3Dip-10-45-174-120.ec2.int=
ernal/=0A> > > > >> 10.45.174.120:60020]=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0  at=0A> > > > >> java.util.concurrent.FutureTask$Sync.innerGet(Futur=
eTask.java:252)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> java.util.c=
oncurrent.FutureTask.get(FutureTask.java:111)=0A> > > > >>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0  at=0A> > > > >>=0A> > > >=0A> > >=0A> >=0A> org.apache.hadoop.h=
base.client.HConnectionManager$HConnectionImplementation.processBatchCallba=
ck(HConnectionManager.java:1708)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  a=
t=0A> > > > >>=0A> > > >=0A> > >=0A> >=0A> org.apache.hadoop.hbase.client.H=
ConnectionManager$HConnectionImplementation.processBatch(HConnectionManager=
.java:1560)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > > > >>=0A> or=
g.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:994)=0A> > > >=
 >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > > org.apache.hadoop.hbase.client.H=
Table.doPut(HTable.java:850)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A=
> > org.apache.hadoop.hbase.client.HTable.put(HTable.java:826)=0A> > > > >>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > com.yahoo.ycsb.db.HBaseClient.update(=
HBaseClient.java:328)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > com=
.yahoo.ycsb.db.HBaseClient.insert(HBaseClient.java:357)=0A> > > > >>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0  at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148=
)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > > > >>=0A> > com.yahoo.=
ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)=0A> > > > >>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0  at com.yahoo.ycsb.ClientThread.run(Client.java:26=
9)=0A> > > > >> Caused by: java.net.SocketTimeoutException: Call to=0A> > >=
 > >> ip-10-45-174-120.ec2.internal/10.45.174.120:60020 failed on=0A> socke=
t=0A> > > > >> timeout exception: java.net.SocketTimeoutException: 60000 mi=
llis=0A> > > timeout=0A> > > > >> while waiting for channel to be ready for=
 read. ch :=0A> > > > >> java.nio.channels.SocketChannel[connected local=3D=
/=0A> > 10.180.211.173:42466=0A> > > > remote=3Dip-10-45-174-120.ec2.intern=
al/=0A> > > > >> 10.45.174.120:60020]=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0  at=0A> > > > >>=0A> > > >=0A> > >=0A> >=0A> org.apache.hadoop.hbase.ip=
c.HBaseClient.wrapException(HBaseClient.java:1043)=0A> > > > >>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0  at=0A> > > > >>=0A> org.apache.hadoop.hbase.ipc.HBaseCli=
ent.call(HBaseClient.java:1016)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=
=0A> > > > >>=0A> > > >=0A> > >=0A> >=0A> org.apache.hadoop.hbase.ipc.Writa=
bleRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)=0A> > > > >>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0  at com.sun.proxy.$Proxy5.multi(Unknown Source)=0A> > =
> > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > > > >>=0A> > > >=0A> > >=0A> >=
=0A> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplement=
ation$3$1.call(HConnectionManager.java:1537)=0A> > > > >>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0  at=0A> > > > >>=0A> > > >=0A> > >=0A> >=0A> org.apache.hadoop.h=
base.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnecti=
onManager.java:1535)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > > > =
>>=0A> > > >=0A> > >=0A> >=0A> org.apache.hadoop.hbase.client.ServerCallabl=
e.withoutRetries(ServerCallable.java:229)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0  at=0A> > > > >>=0A> > > >=0A> > >=0A> >=0A> org.apache.hadoop.hbase=
.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionMana=
ger.java:1544)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > > > >>=0A>=
 > > >=0A> > >=0A> >=0A> org.apache.hadoop.hbase.client.HConnectionManager$=
HConnectionImplementation$3.call(HConnectionManager.java:1532)=0A> > > > >>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > > > >> java.util.concurrent.FutureTas=
k$Sync.innerRun(FutureTask.java:334)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0  at=0A> java.util.concurrent.FutureTask.run(FutureTask.java:166)=0A> > =
> > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > > > >>=0A> > > >=0A> > >=0A> >=
=0A> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j=
ava:1146)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at=0A> > > > >>=0A> > > =
>=0A> > >=0A> >=0A> java.util.concurrent.ThreadPoolExecutor$Worker.run(Thre=
adPoolExecutor.java:615)=0A> > > > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0  at java.l=
ang.Thread.run(Thread.java:701)=0A> > > > >>=0A> > > > >>=0A> > > > >> SKIP=
PED A LOT=0A> > > > >>=0A> > > > >>=0A> > > > >>=C2=A0 1200 sec: 5674180 op=
erations; 896.82 current ops/sec; [INSERT=0A> > > > >> AverageLatency(us)=
=3D7506.37]=0A> > > > >>=C2=A0 1210 sec: 6022326 operations; 34811.12 curre=
nt ops/sec; [INSERT=0A> > > > >> AverageLatency(us)=3D1998.26]=0A> > > > >>=
=C2=A0 1220 sec: 6102627 operations; 8018.07 current ops/sec; [INSERT=0A> >=
 > > >> AverageLatency(us)=3D395.11]=0A> > > > >>=C2=A0 1230 sec: 6152632 o=
perations; 5000 current ops/sec; [INSERT=0A> > > > >> AverageLatency(us)=3D=
182.53]=0A> > > > >>=C2=A0 1240 sec: 6202641 operations; 4999.9 current ops=
/sec; [INSERT=0A> > > > >> AverageLatency(us)=3D201.76]=0A> > > > >>=C2=A0 =
1250 sec: 6252642 operations; 4999.6 current ops/sec; [INSERT=0A> > > > >> =
AverageLatency(us)=3D190.46]=0A> > > > >>=C2=A0 1260 sec: 6302653 operation=
s; 5000.1 current ops/sec; [INSERT=0A> > > > >> AverageLatency(us)=3D212.31=
]=0A> > > > >>=C2=A0 1270 sec: 6352660 operations; 5000.2 current ops/sec; =
[INSERT=0A> > > > >> AverageLatency(us)=3D217.77]=0A> > > > >>=C2=A0 1280 s=
ec: 6402731 operations; 5000.1 current ops/sec; [INSERT=0A> > > > >> Averag=
eLatency(us)=3D195.83]=0A> > > > >>=C2=A0 1290 sec: 6452740 operations; 499=
9.9 current ops/sec; [INSERT=0A> > > > >> AverageLatency(us)=3D232.43]=0A> =
> > > >>=C2=A0 1300 sec: 6502743 operations; 4999.8 current ops/sec; [INSER=
T=0A> > > > >> AverageLatency(us)=3D290.52]=0A> > > > >>=C2=A0 1310 sec: 65=
52755 operations; 5000.2 current ops/sec; [INSERT=0A> > > > >> AverageLaten=
cy(us)=3D259.49]=0A> > > > >>=0A> > > > >>=0A> > > > >> As you can see here=
 there is ~ 60 sec total write stall on a=0A> cluster=0A> > > > >> which I =
suppose 100% correlates with compactions started (minor)=0A> > > > >>=0A> >=
 > > >> MAX_FILESIZE =3D 5GB=0A> > > > >> ## Regions of 'usertable' - 50=0A=
> > > > >>=0A> > > > >> I would appreciate any advices on how to get rid of=
 these stalls.=0A> 5K=0A> > > per=0A> > > > >> sec is quite moderate load e=
ven for 5 lousy AWS servers. Or it is=0A> > not?=0A> > > > >>=0A> > > > >> =
Best regards,=0A> > > > >> Vladimir Rodionov=0A> > > > >> Principal Platfor=
m Engineer=0A> > > > >> Carrier IQ, www.carrieriq.com=0A> > > > >> e-mail: =
vrodionov@carrieriq.com=0A> > > > >>=0A> > > > >>=0A> > > > >> Confidential=
ity Notice:=C2=A0 The information contained in this=0A> message,=0A> > > > =
>> including any attachments hereto, may be confidential and is=0A> > inten=
ded=0A> > > > to be=0A> > > > >> read only by the individual or entity to w=
hom this message is=0A> > > > addressed. If=0A> > > > >> the reader of this=
 message is not the intended recipient or an=0A> agent=0A> > > or=0A> > > >=
 >> designee of the intended recipient, please note that any review,=0A> > =
use,=0A> > > > >> disclosure or distribution of this message or its attachm=
ents, in=0A> > any=0A> > > > form,=0A> > > > >> is strictly prohibited.=C2=
=A0 If you have received this message in=0A> error,=0A> > > > please=0A> > =
> > >> immediately notify the sender and/or=0A> Notifications@carrieriq.com=
and=0A> > > > >> delete or destroy any copy of this message and its attachm=
ents.=0A> > > > >>=0A> > > > >=0A> > > > >=0A> > > > >=0A> > > > > --=0A> >=
 > > > Best regards,=0A> > > > >=0A> > > > >=C2=A0 =C2=A0 - Andy=0A> > > > =
>=0A> > > > > Problems worthy of attack prove their worth by hitting back. =
- Piet=0A> > > Hein=0A> > > > > (via Tom White)=0A> > > > >=0A> > > >=0A> >=
 > >=0A> > > >=0A> > > > --=0A> > > > Best regards,=0A> > > >=0A> > > >=C2=
=A0 =C2=A0 - Andy=0A> > > >=0A> > > > Problems worthy of attack prove their=
 worth by hitting back. - Piet=0A> > Hein=0A> > > > (via Tom White)=0A> > >=
 >=0A> > >=0A> >=0A>