Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 806DF1021C for ; Thu, 16 Jan 2014 05:21:18 +0000 (UTC) Received: (qmail 81869 invoked by uid 500); 16 Jan 2014 05:21:16 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 81279 invoked by uid 500); 16 Jan 2014 05:21:10 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 81270 invoked by uid 99); 16 Jan 2014 05:21:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 05:21:08 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.138.229.32] (HELO nm39.bullet.mail.ne1.yahoo.com) (98.138.229.32) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 05:21:01 +0000 Received: from [127.0.0.1] by nm39.bullet.mail.ne1.yahoo.com with NNFMP; 16 Jan 2014 05:20:38 -0000 Received: from [98.138.101.131] by nm39.bullet.mail.ne1.yahoo.com with NNFMP; 16 Jan 2014 05:17:48 -0000 Received: from [66.196.81.171] by tm19.bullet.mail.ne1.yahoo.com with NNFMP; 16 Jan 2014 05:17:48 -0000 Received: from [98.139.212.226] by tm17.bullet.mail.bf1.yahoo.com with NNFMP; 16 Jan 2014 05:17:47 -0000 Received: from [127.0.0.1] by omp1035.mail.bf1.yahoo.com with NNFMP; 16 Jan 2014 05:17:47 -0000 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 982782.42300.bm@omp1035.mail.bf1.yahoo.com Received: (qmail 42952 invoked by uid 60001); 16 Jan 2014 05:17:47 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1389849467; bh=HK36zogKERASVd5WqyK7UnJbtp1AlNimnbQtIJMvv20=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=sSuhKsBuMv6G5L7Hf4029L6DCMclvi2oQSimri0mpllQZipfwI4vCIBToFq+rUOX93oCsbximLmS/GPjgQYui/r+k4rQvl5y2T4Lbx0a6LCUWLV3+XpdD6/19L1mKH1EkGLSdFGB7A9OmUw84G8I9DzYta6fTAQjbQRX0hWafCs= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=ZDiNgM36SUQjSrvFmk2/bs+3NHuk49H9y6SIufXzHaOvrc3e6d/+64nKjJHtzDS0PI2OqaYjWip3qwGn6yh8zjDYR21f+1V2ZU1LwC1oRmfGXyMQ5udaYYx3RwSYwOzBa+vaRTqp+j4q1Kv7hGtgqLNS5/jBBlecsuXfhVAwGMY=; X-YMail-OSG: 0aKzjHMVM1lGtMD.fF2wDrZi_A4K42KN1pmHDOTRsCq9oDu mjmlf0dPxTcnxjN7hHxVqL.6BpqPinm0qp40D_6LjizQkSkZ6RQ.Sa5RNJYV KUhVcJalR5g7ji58NvytALiJrpAmzAOx6JY.vbO1Fw_g73N6HlzfkYfa749t mj3GkrfwBuJjOZ_OfCEVmLkTWmL6L1fSx6iLJ8z4mE_o01ObeRqSnhGv6raD 8NlLskntk12oQ6eAJ_xRXHaV5N6FuqVcGg4TcjUv6SvtePApuJsrHQ1Zn6DN fqnGM3pj3yMOnomfa.4PIgfoosSA.O9maMy.DnGNBKkCRT9Dr1fiEhbyuAdc 6_tSiiriaVXWsvfUDL_KigwKiCyijR5QBofCVbKQn2SXmjlr5SLvzx3EQzy3 TgiLgwffJKqmWstewZrfWiYgkCnkeiQygLwkJSO.C42aFC_LMLMXPSVLavRy _YAwTqgzB53q2fhloQNFbPkjMfH11sKlHf.n2mbMkYg0viWfXufhiPNIZVJz D2VnxjxkbPcUJkDOuq_8czajk9jIxIpedkGp._FaitPSrWYuHmpzqR5xwS.T l8gmrviNvu_3n6o_z0tWYZe9SzDU5bDpzYofHkgMzVk6mhOVufzuicBf0.8q N4rdle6YAMJckoej6DwjyjHeAOTgJQkyjgh4RCijBp1N9B7LzLNkEYq6vUNU HbUt6e3z99a0Je6tQYmxu7OElLYDIH9gNR.n8bncU4y.kI2JuU3Li1_mCFF8 ZsE8IonFoxCo0Ok2RKIo- Received: from [24.4.148.188] by web140603.mail.bf1.yahoo.com via HTTP; Wed, 15 Jan 2014 21:17:47 PST X-Rocket-MIMEInfo: 002.001,U28gd2hlcmUncyB0aGUgYm90dGxlbmVjaz8gWW91IHNheSBpdCdzIG5vdCBJTywgbm90IGlzIGl0IENQVSwgSSBwcmVzdW1lLgpOZXR3b3JrPyBBcmUgdGhlIHdyaXRlcnMgYmxvY2tlZCBiZWNhdXNlIHRoZXJlIGFyZSB0b28gbWFueSBzdG9yZWZpbGVzPyAoaW4gd2hpY2ggY2FzZSB5b3UgbWF4ZWQgb3V0IHlvdXIgc3RvcmFnZSBJTykKQXJlIHlvdSBob3RzcG90dGluZyBhIHJlZ2lvbiBzZXJ2ZXI_CgpGcm9tIHRoZSBzdGFja3RyYWNlIGl0IGxvb2tzIGxpa2UgeWNzYiBpcyBkb2luZyBzaW5nbGUgcHV0cywBMAEBAQE- X-RocketYMMF: lhofhansl X-Mailer: YahooMailWebService/0.8.172.614 References: Message-ID: <1389849467.20702.YahooMailNeo@web140603.mail.bf1.yahoo.com> Date: Wed, 15 Jan 2014 21:17:47 -0800 (PST) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2) To: "dev@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-1172831624-2136494398-1389849467=:20702" X-Virus-Checked: Checked by ClamAV on apache.org ---1172831624-2136494398-1389849467=:20702 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable So where's the bottleneck? You say it's not IO, not is it CPU, I presume.= =0ANetwork? Are the writers blocked because there are too many storefiles? = (in which case you maxed out your storage IO)=0AAre you hotspotting a regio= n server?=0A=0AFrom the stacktrace it looks like ycsb is doing single puts,= each incurring an RPC. You're testing AWS' network :)=0A=0A=0AI write 10-2= 0k (small) rows per second in bulk on a single box for testing all the time= .=0AWith 3-way replication a 5 nodes cluster is pretty puny. Each box will = get 60% of each write on average, just to state the obvious.=0A=0AAs I said= , if it's slow, I'd love to see where the bottleneck is, so that we can fix= it, if it is something we can fix in HBase.=0A=0A-- Lars=0A=0A=0A=0A______= __________________________=0A From: Vladimir Rodionov =0ATo: "dev@hbase.apache.org" =0ASent: Wednesday,= January 15, 2014 5:32 PM=0ASubject: Re: HBase 0.94.15: writes stalls perio= dically even under moderate steady load (AWS EC2)=0A =0A=0AYes, I am using = ephemeral (local) storage. I found that iostat is most of=0Athe time idle o= n 3K load with periodic bursts up to 10% iowait. 3-4K is=0Aprobably the max= imum this skinny cluster can sustain w/o additional=0Aconfiguration tweakin= g. I will try more powerful instances, of course, but=0Athe beauty of m1.xl= arge is 0.05 price on the spot market. 5 nodes cluster=0A(+1) is ~ $7 per d= ay. Good for experiments, but, definitely, not for real=0Atesting.=0A=0A-Vl= adimir Rodionov=0A=0A=0A=0AOn Wed, Jan 15, 2014 at 3:27 PM, Andrew Purtell = wrote:=0A=0A> Also I assume your HDFS is provisioned = on locally attached disk, aka=0A> instance store, and not EBS?=0A>=0A>=0A> = On Wed, Jan 15, 2014 at 3:26 PM, Andrew Purtell =0A> w= rote:=0A>=0A> > m1.xlarge is a poorly provisioned instance type, with low P= PS at the=0A> > network layer. Can you try a type advertised to have "high"= I/O=0A> > performance?=0A> >=0A> >=0A> > On Wed, Jan 15, 2014 at 12:33 PM,= Vladimir Rodionov <=0A> > vrodionov@carrieriq.com> wrote:=0A> >=0A> >> Thi= s is something which needs to be definitely solved/fixed/resolved=0A> >>=0A= > >> I am running YCSB benchmark on aws ec2 on a small HBase cluster=0A> >>= =0A> >> 5 (m1.xlarge) as RS=0A> >> 1 (m1.xlarge) hbase-master, zookeper=0A>= >>=0A> >> Whirr 0.8.2 (with many hacks) is used to provision HBase.=0A> >>= =0A> >> I am running 1 ycsb client (100% insert ops) throttled at 5K ops:= =0A> >>=0A> >> ./bin/ycsb load hbase -P workloads/load20m -p columnfamily= =3Dfamily -s=0A> >> -threads 10 -target 5000=0A> >>=0A> >> OUTPUT:=0A> >>= =0A> >> 1120 sec: 5602339 operations; 4999.7 current ops/sec; [INSERT=0A> >= > AverageLatency(us)=3D225.53]=0A> >>=A0 1130 sec: 5652117 operations; 4969= .35 current ops/sec; [INSERT=0A> >> AverageLatency(us)=3D203.31]=0A> >>=A0 = 1140 sec: 5665210 operations; 1309.04 current ops/sec; [INSERT=0A> >> Avera= geLatency(us)=3D17.13]=0A> >>=A0 1150 sec: 5665210 operations; 0 current op= s/sec;=0A> >>=A0 1160 sec: 5665210 operations; 0 current ops/sec;=0A> >>=A0= 1170 sec: 5665210 operations; 0 current ops/sec;=0A> >>=A0 1180 sec: 56652= 10 operations; 0 current ops/sec;=0A> >>=A0 1190 sec: 5665210 operations; 0= current ops/sec;=0A> >> 2014-01-15 15:19:34,139 Thread-2 WARN=0A> >>=A0 [H= ConnectionManager$HConnectionImplementation] Failed all from=0A> >>=0A> reg= ion=3Dusertable,user6039,1389811852201.40518862106856d23b883e5d543d0b89.,= =0A> >> hostname=3Dip-10-45-174-120.ec2.internal, port=3D60020=0A> >> java.= util.concurrent.ExecutionException:=0A> java.net.SocketTimeoutException:=0A= > >> Call to ip-10-45-174-120.ec2.internal/10.45.174.120:60020 failed on=0A= > >> socket timeout exception: java.net.SocketTimeoutException: 60000 milli= s=0A> >> timeout while waiting for channel to be ready for read. ch :=0A> >= > java.nio.channels.SocketChannel[connected local=3D/10.180.211.173:42466= =0A> remote=3Dip-10-45-174-120.ec2.internal/=0A> >> 10.45.174.120:60020]=0A= > >>=A0 =A0 =A0 =A0 at=0A> >> java.util.concurrent.FutureTask$Sync.innerGe= t(FutureTask.java:252)=0A> >>=A0 =A0 =A0 =A0 at java.util.concurrent.Futur= eTask.get(FutureTask.java:111)=0A> >>=A0 =A0 =A0 =A0 at=0A> >>=0A> org.apa= che.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.proces= sBatchCallback(HConnectionManager.java:1708)=0A> >>=A0 =A0 =A0 =A0 at=0A> = >>=0A> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImpleme= ntation.processBatch(HConnectionManager.java:1560)=0A> >>=A0 =A0 =A0 =A0 a= t=0A> >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:994= )=0A> >>=A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.client.HTable.doPut(HTa= ble.java:850)=0A> >>=A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.client.HTab= le.put(HTable.java:826)=0A> >>=A0 =A0 =A0 =A0 at com.yahoo.ycsb.db.HBaseCl= ient.update(HBaseClient.java:328)=0A> >>=A0 =A0 =A0 =A0 at com.yahoo.ycsb.= db.HBaseClient.insert(HBaseClient.java:357)=0A> >>=A0 =A0 =A0 =A0 at com.y= ahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)=0A> >>=A0 =A0 =A0 =A0 at=0A= > >> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)= =0A> >>=A0 =A0 =A0 =A0 at com.yahoo.ycsb.ClientThread.run(Client.java:269)= =0A> >> Caused by: java.net.SocketTimeoutException: Call to=0A> >> ip-10-45= -174-120.ec2.internal/10.45.174.120:60020 failed on socket=0A> >> timeout e= xception: java.net.SocketTimeoutException: 60000 millis timeout=0A> >> whil= e waiting for channel to be ready for read. ch :=0A> >> java.nio.channels.S= ocketChannel[connected local=3D/10.180.211.173:42466=0A> remote=3Dip-10-45-= 174-120.ec2.internal/=0A> >> 10.45.174.120:60020]=0A> >>=A0 =A0 =A0 =A0 at= =0A> >>=0A> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClie= nt.java:1043)=0A> >>=A0 =A0 =A0 =A0 at=0A> >> org.apache.hadoop.hbase.ipc.= HBaseClient.call(HBaseClient.java:1016)=0A> >>=A0 =A0 =A0 =A0 at=0A> >>=0A= > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcE= ngine.java:87)=0A> >>=A0 =A0 =A0 =A0 at com.sun.proxy.$Proxy5.multi(Unknow= n Source)=0A> >>=A0 =A0 =A0 =A0 at=0A> >>=0A> org.apache.hadoop.hbase.clie= nt.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager= .java:1537)=0A> >>=A0 =A0 =A0 =A0 at=0A> >>=0A> org.apache.hadoop.hbase.cl= ient.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManag= er.java:1535)=0A> >>=A0 =A0 =A0 =A0 at=0A> >>=0A> org.apache.hadoop.hbase.= client.ServerCallable.withoutRetries(ServerCallable.java:229)=0A> >>=A0 =A0= =A0 =A0 at=0A> >>=0A> org.apache.hadoop.hbase.client.HConnectionManager$H= ConnectionImplementation$3.call(HConnectionManager.java:1544)=0A> >>=A0 =A0= =A0 =A0 at=0A> >>=0A> org.apache.hadoop.hbase.client.HConnectionManager$H= ConnectionImplementation$3.call(HConnectionManager.java:1532)=0A> >>=A0 =A0= =A0 =A0 at=0A> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTas= k.java:334)=0A> >>=A0 =A0 =A0 =A0 at java.util.concurrent.FutureTask.run(F= utureTask.java:166)=0A> >>=A0 =A0 =A0 =A0 at=0A> >>=0A> java.util.concurre= nt.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)=0A> >>=A0 =A0= =A0 =A0 at=0A> >>=0A> java.util.concurrent.ThreadPoolExecutor$Worker.run(= ThreadPoolExecutor.java:615)=0A> >>=A0 =A0 =A0 =A0 at java.lang.Thread.run= (Thread.java:701)=0A> >>=0A> >>=0A> >> SKIPPED A LOT=0A> >>=0A> >>=0A> >>= =A0 1200 sec: 5674180 operations; 896.82 current ops/sec; [INSERT=0A> >> Av= erageLatency(us)=3D7506.37]=0A> >>=A0 1210 sec: 6022326 operations; 34811.1= 2 current ops/sec; [INSERT=0A> >> AverageLatency(us)=3D1998.26]=0A> >>=A0 1= 220 sec: 6102627 operations; 8018.07 current ops/sec; [INSERT=0A> >> Averag= eLatency(us)=3D395.11]=0A> >>=A0 1230 sec: 6152632 operations; 5000 current= ops/sec; [INSERT=0A> >> AverageLatency(us)=3D182.53]=0A> >>=A0 1240 sec: 6= 202641 operations; 4999.9 current ops/sec; [INSERT=0A> >> AverageLatency(us= )=3D201.76]=0A> >>=A0 1250 sec: 6252642 operations; 4999.6 current ops/sec;= [INSERT=0A> >> AverageLatency(us)=3D190.46]=0A> >>=A0 1260 sec: 6302653 op= erations; 5000.1 current ops/sec; [INSERT=0A> >> AverageLatency(us)=3D212.3= 1]=0A> >>=A0 1270 sec: 6352660 operations; 5000.2 current ops/sec; [INSERT= =0A> >> AverageLatency(us)=3D217.77]=0A> >>=A0 1280 sec: 6402731 operations= ; 5000.1 current ops/sec; [INSERT=0A> >> AverageLatency(us)=3D195.83]=0A> >= >=A0 1290 sec: 6452740 operations; 4999.9 current ops/sec; [INSERT=0A> >> A= verageLatency(us)=3D232.43]=0A> >>=A0 1300 sec: 6502743 operations; 4999.8 = current ops/sec; [INSERT=0A> >> AverageLatency(us)=3D290.52]=0A> >>=A0 1310= sec: 6552755 operations; 5000.2 current ops/sec; [INSERT=0A> >> AverageLat= ency(us)=3D259.49]=0A> >>=0A> >>=0A> >> As you can see here there is ~ 60 s= ec total write stall on a cluster=0A> >> which I suppose 100% correlates wi= th compactions started (minor)=0A> >>=0A> >> MAX_FILESIZE =3D 5GB=0A> >> ##= Regions of 'usertable' - 50=0A> >>=0A> >> I would appreciate any advices o= n how to get rid of these stalls. 5K per=0A> >> sec is quite moderate load = even for 5 lousy AWS servers. Or it is not?=0A> >>=0A> >> Best regards,=0A>= >> Vladimir Rodionov=0A> >> Principal Platform Engineer=0A> >> Carrier IQ,= www.carrieriq.com=0A> >> e-mail: vrodionov@carrieriq.com=0A> >>=0A> >>=0A>= >> Confidentiality Notice:=A0 The information contained in this message,= =0A> >> including any attachments hereto, may be confidential and is intend= ed=0A> to be=0A> >> read only by the individual or entity to whom this mess= age is=0A> addressed. If=0A> >> the reader of this message is not the inten= ded recipient or an agent or=0A> >> designee of the intended recipient, ple= ase note that any review, use,=0A> >> disclosure or distribution of this me= ssage or its attachments, in any=0A> form,=0A> >> is strictly prohibited.= =A0 If you have received this message in error,=0A> please=0A> >> immediate= ly notify the sender and/or Notifications@carrieriq.com and=0A> >> delete o= r destroy any copy of this message and its attachments.=0A> >>=0A> >=0A> >= =0A> >=0A> > --=0A> > Best regards,=0A> >=0A> >=A0 =A0 - Andy=0A> >=0A> > P= roblems worthy of attack prove their worth by hitting back. - Piet Hein=0A>= > (via Tom White)=0A> >=0A>=0A>=0A>=0A> --=0A> Best regards,=0A>=0A>=A0 = =A0 - Andy=0A>=0A> Problems worthy of attack prove their worth by hitting b= ack. - Piet Hein=0A> (via Tom White)=0A> ---1172831624-2136494398-1389849467=:20702--