Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E2A741050D for ; Thu, 16 Jan 2014 07:16:39 +0000 (UTC) Received: (qmail 26072 invoked by uid 500); 16 Jan 2014 07:16:38 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 25195 invoked by uid 500); 16 Jan 2014 07:16:37 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 25179 invoked by uid 99); 16 Jan 2014 07:16:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 07:16:35 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.138.120.55] (HELO nm48.bullet.mail.ne1.yahoo.com) (98.138.120.55) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 07:16:29 +0000 Received: from [127.0.0.1] by nm48.bullet.mail.ne1.yahoo.com with NNFMP; 16 Jan 2014 07:16:08 -0000 Received: from [98.138.100.112] by nm48.bullet.mail.ne1.yahoo.com with NNFMP; 16 Jan 2014 07:13:22 -0000 Received: from [98.139.212.152] by tm103.bullet.mail.ne1.yahoo.com with NNFMP; 16 Jan 2014 07:13:22 -0000 Received: from [98.139.212.224] by tm9.bullet.mail.bf1.yahoo.com with NNFMP; 16 Jan 2014 07:13:22 -0000 Received: from [127.0.0.1] by omp1033.mail.bf1.yahoo.com with NNFMP; 16 Jan 2014 07:13:22 -0000 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 412049.78363.bm@omp1033.mail.bf1.yahoo.com Received: (qmail 1027 invoked by uid 60001); 16 Jan 2014 07:13:22 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1389856402; bh=fbWWy8ZHqzLFvrg19jrQMN26xOC+bIbpOujnDQyHWBg=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=wDp+5uy7AHq92AT8jyhV+uPFxqiYoA3zN8mngRWwtS6uSuxaCyt1IsxfwGLqZuO1+Hk7uV4SlEpGpTB/jUXRkLtUy7ZXo4j18+/SlIljf4MbETbmwQSM8X0hP/9g75t5nhwdX6bSBs1TD0X64a2P0Avek66nRZjV4wZ9subczh8= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=Yfzs7cv3jYGvdyezdJnH93Hi0VFk7uyPt7OgzlxYw3Ez4tPXXp3YNvFA6u9RZLUM/u2qCy6p6ucnAMLMGkg62w6AhvBDCRfFCPcoric2v8J5Zk+C1a6BgvgJugHmKWwOliDFyAlwgbcWiPjwslRFjx3xmZqDb4HPGLoDx/gstWY=; X-YMail-OSG: zCi4a2EVM1ky7MeTws7xf5QxAabPrq0Fr6CmqAYJvXNLotA xONJyJn4S.nIPufshk7sWz5AjKT.Xj8_UKwKti6FR7rD_xWQv4IkAy_bUY7j 3Z6yQlsazTB3srJ6kCAQpZH0tRZ.1d1csq_I03vslDHtemZMJyZ.r9KJj5fM poKA.niijayz6kIleIvQqJ64Mf3hl.h17Jw.nHk7pJPXJnYyU7kOFPKwcHf4 fbLvPIybO91LNuGkWYZgBZEr5ZLWstNyRxDZe_G.SxPuXjYHfcRsEtmg_qMB a_UAr4mkKACIzMkoHtvxVV3m3esfAEe7MBTE.flachcxWY5nClUOyfJ8YrFK 4uoeZGQZO2hPl6cNTj52WlZJa4nfTuIYJ3QgR7pDjvzoZuYyISW7JgONKgQQ R.ie1dMK.dcQ01gD0dKJ13IGHTt_ZhtzT7LVpi_Gt4yJIrNXk_fGLs2byjsA 3DagCJnE0m9qoL31DXRgSQ_VP0vcqSVnk7ZRZnGmyeAItRGl4S9h060T6Qb2 EOETxPSg2ywRCMRN0GE7z1EqZR7DugiBrhvmHl4psQe0ff7G274v76IjDfFU YLPZRbNNS8aO6bAn9p17B1O3Xc62dy9k549Dubf5LInMtN6daL9PZ2eUJwGu qKeBOcX19XHHLNw_Y4Jv8HFZIgbKt1Udv7OvIZtJUbpLjWLA9TkiV79lpBn6 GnhVBkpVCUF91v.Y3Jp.8G3gwZazB6I7WJMEE22PYBbo8i0ZROUrRdRFjR5Z TkOMZikJ9dIUksXjhMO40yKaOprjJiptt6FtZPdqfD3TMzQ0cPiF3Um5QPqk TL.etF4cs5njbrxIaZoQDsNDf3ChzIGUdhk15bRrhpbyy4j5MkB2.SWSfgaa VQOu4gQ5Np4SWg9_V0PXpADm876THi.EGF6ykerPdvdX9baISV2TF4VvJ9oU KIuNS8_ld.qqN6nJCx8xSfoLlhN1kncN9jGbVcEMLTAMvomEJVHI3brk- Received: from [24.4.148.188] by web140602.mail.bf1.yahoo.com via HTTP; Wed, 15 Jan 2014 23:13:22 PST X-Rocket-MIMEInfo: 002.001,WW91IHNob3VsZCBhbHNvIHNldCBkZnMucmVwbGljYXRpb24gdG8gMiBpbiBoYmFzZS1zaXRlLnhtbC4gVGhlIHJlcGxpY2F0aW9uIGZhY3RvciBjYW4gYmUgb3ZlcnJpZGRlbiBwZXIgZmlsZSBhbmQgSEJhc2UgZG9lcyB0aGlzIGluIHNvbWUgY2FzZXMgKGFzIGluIHRoZSBjYXNlIG9mIHRoZSBITG9nKS4KCgpJIGRvIG5vdCB0aGluayB0aGlzIGlzIHRoZSBpc3N1ZSBoZXJlLCB0aG91Z2guCgpTbyBpcyBETiAxMC4zOC4xMDYuMjM0OjUwMDEwIGJhZCBvciBub3Q_IExvb2tzIGxpa2UgeW91IGhhdmUgYW4gSEQBMAEBAQE- X-RocketYMMF: lhofhansl X-Mailer: YahooMailWebService/0.8.173.622 References: <1389849467.20702.YahooMailNeo@web140603.mail.bf1.yahoo.com> Message-ID: <1389856402.45157.YahooMailNeo@web140602.mail.bf1.yahoo.com> Date: Wed, 15 Jan 2014 23:13:22 -0800 (PST) From: lars hofhansl Reply-To: lars hofhansl Subject: =?utf-8?B?UmU6IOetlOWkjTogSEJhc2UgMC45NC4xNTogd3JpdGVzIHN0YWxscyBwZXJp?= =?utf-8?B?b2RpY2FsbHkgZXZlbiB1bmRlciBtb2RlcmF0ZSBzdGVhZHkgbG9hZCAoQVdT?= =?utf-8?B?IEVDMik=?= To: Vladimir Rodionov , "dev@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org You should also set dfs.replication to 2 in hbase-site.xml. The replication= factor can be overridden per file and HBase does this in some cases (as in= the case of the HLog).=0A=0A=0AI do not think this is the issue here, thou= gh.=0A=0ASo is DN 10.38.106.234:50010 bad or not? Looks like you have an HD= FS problem. Which is likely network related (though you say it's not). What= does Hadoop's FSCK say? This seems to indicate that the HDFS NameNode thin= ks that there only two healthy DataNodes.=0A=0AIf HBase cannot write its da= ta to the selected file system (HDFS normally) it naturally can't do anythi= ng but wait. It's actually quite cool that it recovered.=0A=0A=0A-- Lars=0A= =0A________________________________=0AFrom: Vladimir Rodionov =0ATo: "dev@hbase.apache.org" =0ACc: lars= hofhansl =0ASent: Wednesday, January 15, 2014 10:45 PM= =0ASubject: Re: =E7=AD=94=E5=A4=8D: HBase 0.94.15: writes stalls periodical= ly even under moderate steady load (AWS EC2)=0A=0A=0A=0AThis what I found i= n a RS Log:=0A2014-01-16 01:22:18,256 ResponseProcessor for block blk_56193= 07008368309102_2603 WARN=C2=A0 [DFSClient] DFSOutputStream ResponseProcesso= r exception=C2=A0 for block blk_5619307008368309102_2603java.io.IOException= : Bad response 1 for block blk_5619307008368309102_2603 from datanode 10.38= .106.234:50010=0A=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.h= adoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2= 977)=0A=0A2014-01-16 01:22:18,258 DataStreamer for file /hbase/.logs/ip-10-= 10-25-199.ec2.internal,60020,1389843986689/ip-10-10-25-199.ec2.internal%2C6= 0020%2C1389843986689.1389853200626 WARN=C2=A0 [DFSClient] Error Recovery fo= r block blk_5619307008368309102_2603 bad datanode[2] 10.38.106.234:50010=0A= 2014-01-16 01:22:18,258 DataStreamer for file /hbase/.logs/ip-10-10-25-199.= ec2.internal,60020,1389843986689/ip-10-10-25-199.ec2.internal%2C60020%2C138= 9843986689.1389853200626 WARN=C2=A0 [DFSClient] Error Recovery for block bl= k_5619307008368309102_2603 in pipeline 10.10.25.199:50010, 10.40.249.135:50= 010, 10.38.106.234:50010: bad datanode 10.38.106.234:50010=0A2014-01-16 01:= 22:22,800 IPC Server handler 10 on 60020 WARN=C2=A0 [HLog] HDFS pipeline er= ror detected. Found 2 replicas but expecting no less than 3 replicas.=C2=A0= Requesting close of hlog.=0A2014-01-16 01:22:22,806 IPC Server handler 2 o= n 60020 WARN=C2=A0 [HLog] HDFS pipeline error detected. Found 2 replicas bu= t expecting no less than 3 replicas.=C2=A0 Requesting close of hlog.=0A2014= -01-16 01:22:22,808 IPC Server handler 28 on 60020 WARN=C2=A0 [HLog] HDFS p= ipeline error detected. Found 2 replicas but expecting no less than 3 repli= cas.=C2=A0 Requesting close of hlog.=0A2014-01-16 01:22:22,808 IPC Server h= andler 13 on 60020 WARN=C2=A0 [HLog] HDFS pipeline error detected. Found 2 = replicas but expecting no less than 3 replicas.=C2=A0 Requesting close of h= log.=0A2014-01-16 01:22:22,808 IPC Server handler 27 on 60020 WARN=C2=A0 [H= Log] HDFS pipeline error detected. Found 2 replicas but expecting no less t= han 3 replicas.=C2=A0 Requesting close of hlog.=0A2014-01-16 01:22:22,811 I= PC Server handler 22 on 60020 WARN=C2=A0 [HLog] Too many consecutive RollWr= iter requests, it's a sign of the total number of live datanodes is lower t= han the tolerable replicas.=0A2014-01-16 01:22:22,911 IPC Server handler 8 = on 60020 INFO=C2=A0 [HLog] LowReplication-Roller was enabled.=0A2014-01-16 = 01:22:22,930 regionserver60020.cacheFlusher INFO=C2=A0 [HRegion] Finished m= emstore flush of ~128.3m/134538640, currentsize=3D3.0m/3113200 for region u= sertable,,1389844429593.d4843a72f02a7396244930162fbecd06. in 68096ms, seque= nceid=3D108753, compaction requested=3Dfalse=0A2014-01-16 01:22:22,930 regi= onserver60020.logRoller INFO=C2=A0 [FSUtils] FileSystem doesn't support get= DefaultReplication=0A2014-01-16 01:22:22,930 regionserver60020.logRoller IN= FO=C2=A0 [FSUtils] FileSystem doesn't support getDefaultBlockSize=0A2014-01= -16 01:22:23,027 regionserver60020.logRoller INFO=C2=A0 [HLog] Roll /hbase/= .logs/ip-10-10-25-199.ec2.internal,60020,1389843986689/ip-10-10-25-199.ec2.= internal%2C60020%2C1389843986689.1389853200626, entries=3D1012, filesize=3D= 140440002.=C2=A0 for /hbase/.logs/ip-10-10-25-199.ec2.internal,60020,138984= 3986689/ip-10-10-25-199.ec2.internal%2C60020%2C1389843986689.1389853342930= =0A2014-01-16 01:22:23,194 IPC Server handler 23 on 60020 WARN=C2=A0 [HBase= Server] (responseTooSlow): {"processingtimems":68410,"call":"multi(org.apac= he.hadoop.hbase.client.MultiAction@51ff528e), rpc version=3D1, client versi= on=3D29, methodsFingerPrint=3D-540141542","client":"10.38.163.32:51727","st= arttimems":1389853274560,"queuetimems":0,"class":"HRegionServer","responses= ize":0,"method":"multi"}=0A2014-01-16 01:22:23,401 IPC Server handler 13 on= 60020 WARN=C2=A0 [HBaseServer] (responseTooSlow): {"processingtimems":6881= 3,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@4e136610), rpc v= ersion=3D1, client version=3D29, methodsFingerPrint=3D-540141542","client":= "10.38.163.32:51727","starttimems":1389853274586,"queuetimems":0,"class":"H= RegionServer","responsesize":0,"method":"multi"}=0A2014-01-16 01:22:23,609 = IPC Server handler 1 on 60020 WARN=C2=A0 [HBaseServer] (responseTooSlow): {= "processingtimems":69002,"call":"multi(org.apache.hadoop.hbase.client.Multi= Action@51390a8), rpc version=3D1, client version=3D29, methodsFingerPrint= =3D-540141542","client":"10.38.163.32:51727","starttimems":1389853274604,"q= ueuetimems":1,"class":"HRegionServer","responsesize":0,"method":"multi"}=0A= 2014-01-16 01:22:23,629 IPC Server handler 20 on 60020 WARN=C2=A0 [HBaseSer= ver] (responseTooSlow): {"processingtimems":68991,"call":"multi(org.apache.= hadoop.hbase.client.MultiAction@5f125a0f), rpc version=3D1, client version= =3D29, methodsFingerPrint=3D-540141542","client":"10.38.163.32:51727","star= ttimems":1389853274635,"queuetimems":1,"class":"HRegionServer","responsesiz= e":0,"method":"multi"}=0A2014-01-16 01:22:23,656 IPC Server handler 27 on 6= 0020 WARN=C2=A0 [HBaseServer] (responseTooSlow): {"processingtimems":68835,= "call":"multi(org.apache.hadoop.hbase.client.MultiAction@2dd6bf8c), rpc ver= sion=3D1, client version=3D29, methodsFingerPrint=3D-540141542","client":"1= 0.38.163.32:51727","starttimems":1389853274818,"queuetimems":1,"class":"HRe= gionServer","responsesize":0,"method":"multi"}=0A2014-01-16 01:22:23,657 IP= C Server handler 19 on 60020 WARN=C2=A0 [HBaseServer] (responseTooSlow): {"= processingtimems":68982,"call":"multi(org.apache.hadoop.hbase.client.MultiA= ction@6db997d6), rpc version=3D1, client version=3D29, methodsFingerPrint= =3D-540141542","client":"10.38.163.32:51727","starttimems":1389853274673,"q= ueuetimems":1,"class":"HRegionServer","responsesize":0,"method":"multi"}=0A= =0A=0AThere are 10 DNs and all of them are pretty much alive. Replication f= actor is 2 (dfs.replication in hdfs-site.xml). =0A=0A=0A=0A=0A=0AOn Wed, Ja= n 15, 2014 at 9:55 PM, =E8=B0=A2=E8=89=AF wrote:=0A= =0AIt would be better if you could provide some thread dumps while the stal= ls happened.=0A>=0A>Thanks,=0A>Liang=0A>___________________________________= _____=0A>=E5=8F=91=E4=BB=B6=E4=BA=BA: Vladimir Rodionov [vladrodionov@gmail= .com]=0A>=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B41=E6=9C=8816=E6= =97=A5 13:49=0A>=E6=94=B6=E4=BB=B6=E4=BA=BA: dev@hbase.apache.org; lars hof= hansl=0A>=E4=B8=BB=E9=A2=98: Re: HBase 0.94.15: writes stalls periodically = even under moderate steady load (AWS EC2)=0A>=0A>=0A>Its not IO, CPU or Net= work - its HBase. Stalls repeat periodically. Any=0A>particular message in = a Log file I should look for?=0A>=0A>=0A>On Wed, Jan 15, 2014 at 9:17 PM, l= ars hofhansl wrote:=0A>=0A>> So where's the bottleneck? = You say it's not IO, not is it CPU, I presume.=0A>> Network? Are the writer= s blocked because there are too many storefiles?=0A>> (in which case you ma= xed out your storage IO)=0A>> Are you hotspotting a region server?=0A>>=0A>= > From the stacktrace it looks like ycsb is doing single puts, each=0A>> in= curring an RPC. You're testing AWS' network :)=0A>>=0A>>=0A>> I write 10-20= k (small) rows per second in bulk on a single box for testing=0A>> all the = time.=0A>> With 3-way replication a 5 nodes cluster is pretty puny. Each bo= x will get=0A>> 60% of each write on average, just to state the obvious.=0A= >>=0A>> As I said, if it's slow, I'd love to see where the bottleneck is, s= o that=0A>> we can fix it, if it is something we can fix in HBase.=0A>>=0A>= > -- Lars=0A>>=0A>>=0A>>=0A>> ________________________________=0A>> =C2=A0F= rom: Vladimir Rodionov =0A>> To: "dev@hbase.apache.= org" =0A>> Sent: Wednesday, January 15, 2014 5:32 PM= =0A>> Subject: Re: HBase 0.94.15: writes stalls periodically even under mod= erate=0A>> steady load (AWS EC2)=0A>>=0A>>=0A>> Yes, I am using ephemeral (= local) storage. I found that iostat is most of=0A>> the time idle on 3K loa= d with periodic bursts up to 10% iowait. 3-4K is=0A>> probably the maximum = this skinny cluster can sustain w/o additional=0A>> configuration tweaking.= I will try more powerful instances, of course, but=0A>> the beauty of m1.x= large is 0.05 price on the spot market. 5 nodes cluster=0A>> (+1) is ~ $7 p= er day. Good for experiments, but, definitely, not for real=0A>> testing.= =0A>>=0A>> -Vladimir Rodionov=0A>>=0A>>=0A>>=0A>> On Wed, Jan 15, 2014 at 3= :27 PM, Andrew Purtell =0A>> wrote:=0A>>=0A>> > Also I= assume your HDFS is provisioned on locally attached disk, aka=0A>> > insta= nce store, and not EBS?=0A>> >=0A>> >=0A>> > On Wed, Jan 15, 2014 at 3:26 P= M, Andrew Purtell =0A>> > wrote:=0A>> >=0A>> > > m1.xl= arge is a poorly provisioned instance type, with low PPS at the=0A>> > > ne= twork layer. Can you try a type advertised to have "high" I/O=0A>> > > perf= ormance?=0A>> > >=0A>> > >=0A>> > > On Wed, Jan 15, 2014 at 12:33 PM, Vladi= mir Rodionov <=0A>> > > vrodionov@carrieriq.com> wrote:=0A>> > >=0A>> > >> = This is something which needs to be definitely solved/fixed/resolved=0A>> >= >>=0A>> > >> I am running YCSB benchmark on aws ec2 on a small HBase clust= er=0A>> > >>=0A>> > >> 5 (m1.xlarge) as RS=0A>> > >> 1 (m1.xlarge) hbase-ma= ster, zookeper=0A>> > >>=0A>> > >> Whirr 0.8.2 (with many hacks) is used to= provision HBase.=0A>> > >>=0A>> > >> I am running 1 ycsb client (100% inse= rt ops) throttled at 5K ops:=0A>> > >>=0A>> > >> ./bin/ycsb load hbase -P w= orkloads/load20m -p columnfamily=3Dfamily -s=0A>> > >> -threads 10 -target = 5000=0A>> > >>=0A>> > >> OUTPUT:=0A>> > >>=0A>> > >> 1120 sec: 5602339 oper= ations; 4999.7 current ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D225.= 53]=0A>> > >> =C2=A01130 sec: 5652117 operations; 4969.35 current ops/sec; = [INSERT=0A>> > >> AverageLatency(us)=3D203.31]=0A>> > >> =C2=A01140 sec: 56= 65210 operations; 1309.04 current ops/sec; [INSERT=0A>> > >> AverageLatency= (us)=3D17.13]=0A>> > >> =C2=A01150 sec: 5665210 operations; 0 current ops/s= ec;=0A>> > >> =C2=A01160 sec: 5665210 operations; 0 current ops/sec;=0A>> >= >> =C2=A01170 sec: 5665210 operations; 0 current ops/sec;=0A>> > >> =C2=A0= 1180 sec: 5665210 operations; 0 current ops/sec;=0A>> > >> =C2=A01190 sec: = 5665210 operations; 0 current ops/sec;=0A>> > >> 2014-01-15 15:19:34,139 Th= read-2 WARN=0A>> > >> =C2=A0[HConnectionManager$HConnectionImplementation] = Failed all from=0A>> > >>=0A>> >=0A>> region=3Dusertable,user6039,138981185= 2201.40518862106856d23b883e5d543d0b89.,=0A>> > >> hostname=3Dip-10-45-174-1= 20.ec2.internal, port=3D60020=0A>> > >> java.util.concurrent.ExecutionExcep= tion:=0A>> > java.net.SocketTimeoutException:=0A>> > >> Call to ip-10-45-17= 4-120.ec2.internal/10.45.174.120:60020 failed on=0A>> > >> socket timeout e= xception: java.net.SocketTimeoutException: 60000=0A>> millis=0A>> > >> time= out while waiting for channel to be ready for read. ch :=0A>> > >> java.nio= .channels.SocketChannel[connected local=3D/10.180.211.173:42466=0A>> > remo= te=3Dip-10-45-174-120.ec2.internal/=0A>> > >> 10.45.174.120:60020]=0A>> > >= > =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >> java.util.concurrent.FutureTask$= Sync.innerGet(FutureTask.java:252)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at= java.util.concurrent.FutureTask.get(FutureTask.java:111)=0A>> > >> =C2=A0 = =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbase.clien= t.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnec= tionManager.java:1708)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>= =0A>> >=0A>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionI= mplementation.processBatch(HConnectionManager.java:1560)=0A>> > >> =C2=A0 = =C2=A0 =C2=A0 =C2=A0 at=0A>> > >> org.apache.hadoop.hbase.client.HTable.flu= shCommits(HTable.java:994)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> or= g.apache.hadoop.hbase.client.HTable.doPut(HTable.java:850)=0A>> > >> =C2=A0= =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.hbase.client.HTable.put(HTable.j= ava:826)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.yahoo.ycsb.db.HBaseCl= ient.update(HBaseClient.java:328)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at = com.yahoo.ycsb.db.HBaseClient.insert(HBaseClient.java:357)=0A>> > >> =C2=A0= =C2=A0 =C2=A0 =C2=A0 at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148= )=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >> com.yahoo.ycsb.workloa= ds.CoreWorkload.doInsert(CoreWorkload.java:461)=0A>> > >> =C2=A0 =C2=A0 =C2= =A0 =C2=A0 at com.yahoo.ycsb.ClientThread.run(Client.java:269)=0A>> > >> Ca= used by: java.net.SocketTimeoutException: Call to=0A>> > >> ip-10-45-174-12= 0.ec2.internal/10.45.174.120:60020 failed on socket=0A>> > >> timeout excep= tion: java.net.SocketTimeoutException: 60000 millis=0A>> timeout=0A>> > >> = while waiting for channel to be ready for read. ch :=0A>> > >> java.nio.cha= nnels.SocketChannel[connected local=3D/10.180.211.173:42466=0A>> > remote= =3Dip-10-45-174-120.ec2.internal/=0A>> > >> 10.45.174.120:60020]=0A>> > >> = =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbas= e.ipc.HBaseClient.wrapException(HBaseClient.java:1043)=0A>> > >> =C2=A0 =C2= =A0 =C2=A0 =C2=A0 at=0A>> > >> org.apache.hadoop.hbase.ipc.HBaseClient.call= (HBaseClient.java:1016)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>= =0A>> >=0A>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(W= ritableRpcEngine.java:87)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.sun.= proxy.$Proxy5.multi(Unknown Source)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 a= t=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbase.client.HConnectionManager$H= ConnectionImplementation$3$1.call(HConnectionManager.java:1537)=0A>> > >> = =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbas= e.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionM= anager.java:1535)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >= =0A>> org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCa= llable.java:229)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >= =0A>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen= tation$3.call(HConnectionManager.java:1544)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 = =C2=A0 at=0A>> > >>=0A>> >=0A>> org.apache.hadoop.hbase.client.HConnectionM= anager$HConnectionImplementation$3.call(HConnectionManager.java:1532)=0A>> = > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >> java.util.concurrent.FutureTa= sk$Sync.innerRun(FutureTask.java:334)=0A>> > >> =C2=A0 =C2=A0 =C2=A0 =C2=A0= at java.util.concurrent.FutureTask.run(FutureTask.java:166)=0A>> > >> =C2= =A0 =C2=A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> java.util.concurrent.Thre= adPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)=0A>> > >> =C2=A0 =C2= =A0 =C2=A0 =C2=A0 at=0A>> > >>=0A>> >=0A>> java.util.concurrent.ThreadPoolE= xecutor$Worker.run(ThreadPoolExecutor.java:615)=0A>> > >> =C2=A0 =C2=A0 =C2= =A0 =C2=A0 at java.lang.Thread.run(Thread.java:701)=0A>> > >>=0A>> > >>=0A>= > > >> SKIPPED A LOT=0A>> > >>=0A>> > >>=0A>> > >> =C2=A01200 sec: 5674180 = operations; 896.82 current ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D= 7506.37]=0A>> > >> =C2=A01210 sec: 6022326 operations; 34811.12 current ops= /sec; [INSERT=0A>> > >> AverageLatency(us)=3D1998.26]=0A>> > >> =C2=A01220 = sec: 6102627 operations; 8018.07 current ops/sec; [INSERT=0A>> > >> Average= Latency(us)=3D395.11]=0A>> > >> =C2=A01230 sec: 6152632 operations; 5000 cu= rrent ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D182.53]=0A>> > >> =C2= =A01240 sec: 6202641 operations; 4999.9 current ops/sec; [INSERT=0A>> > >> = AverageLatency(us)=3D201.76]=0A>> > >> =C2=A01250 sec: 6252642 operations; = 4999.6 current ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D190.46]=0A>>= > >> =C2=A01260 sec: 6302653 operations; 5000.1 current ops/sec; [INSERT= =0A>> > >> AverageLatency(us)=3D212.31]=0A>> > >> =C2=A01270 sec: 6352660 o= perations; 5000.2 current ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D2= 17.77]=0A>> > >> =C2=A01280 sec: 6402731 operations; 5000.1 current ops/sec= ; [INSERT=0A>> > >> AverageLatency(us)=3D195.83]=0A>> > >> =C2=A01290 sec: = 6452740 operations; 4999.9 current ops/sec; [INSERT=0A>> > >> AverageLatenc= y(us)=3D232.43]=0A>> > >> =C2=A01300 sec: 6502743 operations; 4999.8 curren= t ops/sec; [INSERT=0A>> > >> AverageLatency(us)=3D290.52]=0A>> > >> =C2=A01= 310 sec: 6552755 operations; 5000.2 current ops/sec; [INSERT=0A>> > >> Aver= ageLatency(us)=3D259.49]=0A>> > >>=0A>> > >>=0A>> > >> As you can see here = there is ~ 60 sec total write stall on a cluster=0A>> > >> which I suppose = 100% correlates with compactions started (minor)=0A>> > >>=0A>> > >> MAX_FI= LESIZE =3D 5GB=0A>> > >> ## Regions of 'usertable' - 50=0A>> > >>=0A>> > >>= I would appreciate any advices on how to get rid of these stalls. 5K=0A>> = per=0A>> > >> sec is quite moderate load even for 5 lousy AWS servers. Or i= t is not?=0A>> > >>=0A>> > >> Best regards,=0A>> > >> Vladimir Rodionov=0A>= > > >> Principal Platform Engineer=0A>> > >> Carrier IQ, www.carrieriq.com= =0A>> > >> e-mail: vrodionov@carrieriq.com=0A>> > >>=0A>> > >>=0A>> > >> Co= nfidentiality Notice: =C2=A0The information contained in this message,=0A>>= > >> including any attachments hereto, may be confidential and is intended= =0A>> > to be=0A>> > >> read only by the individual or entity to whom this = message is=0A>> > addressed. If=0A>> > >> the reader of this message is not= the intended recipient or an agent=0A>> or=0A>> > >> designee of the inten= ded recipient, please note that any review, use,=0A>> > >> disclosure or di= stribution of this message or its attachments, in any=0A>> > form,=0A>> > >= > is strictly prohibited. =C2=A0If you have received this message in error,= =0A>> > please=0A>> > >> immediately notify the sender and/or Notifications= @carrieriq.com and=0A>> > >> delete or destroy any copy of this message and= its attachments.=0A>> > >>=0A>> > >=0A>> > >=0A>> > >=0A>> > > --=0A>> > >= Best regards,=0A>> > >=0A>> > > =C2=A0 =C2=A0- Andy=0A>> > >=0A>> > > Prob= lems worthy of attack prove their worth by hitting back. - Piet=0A>> Hein= =0A>> > > (via Tom White)=0A>> > >=0A>> >=0A>> >=0A>> >=0A>> > --=0A>> > Be= st regards,=0A>> >=0A>> > =C2=A0 =C2=A0- Andy=0A>> >=0A>> > Problems worthy= of attack prove their worth by hitting back. - Piet Hein=0A>> > (via Tom W= hite)=0A>> >=0A>>=0A>