Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2ADDD10CE8 for ; Thu, 16 Jan 2014 01:33:07 +0000 (UTC) Received: (qmail 27075 invoked by uid 500); 16 Jan 2014 01:33:05 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 26966 invoked by uid 500); 16 Jan 2014 01:33:04 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 26958 invoked by uid 99); 16 Jan 2014 01:33:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 01:33:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vladrodionov@gmail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-wg0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 01:32:57 +0000 Received: by mail-wg0-f43.google.com with SMTP id y10so2562689wgg.10 for ; Wed, 15 Jan 2014 17:32:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=0fkmjTcSvOh0yUFOosvgc8yYTMqB56RVtYpZyw18OzM=; b=MrkiWRAUJS9Q5dp3ALa/99qImFlX6ladzqOxJwiBhisk0KRBv+UIt873FJl8LeM7j8 T4sGkz27l6boFfIgM53dAMTa63zx3i3ON8AWZByutGlH/PFtU0I2Rt2tmXXRk+LZOdn5 6YH/ew3aZ/1BMtVyetew4A4ctmmzcMoK2MdeNXXBW/Rkf92bHkfDXPjebzqZW0As165B W8EIPdjSD/eADWeApcQlGEtEJBWM4Wlv+X8eXODKlBck2vn28IOeZGmtgdPSIkxP43Ys aEf81uVQKS2wWEglEfqWuOvXVQY4wnG1bU9sHp8oDMX6f8zgKSFQcXtS6Kfy9uluRkVw jxVQ== MIME-Version: 1.0 X-Received: by 10.180.8.65 with SMTP id p1mr5305703wia.54.1389835957118; Wed, 15 Jan 2014 17:32:37 -0800 (PST) Received: by 10.216.15.198 with HTTP; Wed, 15 Jan 2014 17:32:37 -0800 (PST) In-Reply-To: References: Date: Wed, 15 Jan 2014 17:32:37 -0800 Message-ID: Subject: Re: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2) From: Vladimir Rodionov To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=f46d04428f02852c0304f00c6610 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04428f02852c0304f00c6610 Content-Type: text/plain; charset=ISO-8859-1 Yes, I am using ephemeral (local) storage. I found that iostat is most of the time idle on 3K load with periodic bursts up to 10% iowait. 3-4K is probably the maximum this skinny cluster can sustain w/o additional configuration tweaking. I will try more powerful instances, of course, but the beauty of m1.xlarge is 0.05 price on the spot market. 5 nodes cluster (+1) is ~ $7 per day. Good for experiments, but, definitely, not for real testing. -Vladimir Rodionov On Wed, Jan 15, 2014 at 3:27 PM, Andrew Purtell wrote: > Also I assume your HDFS is provisioned on locally attached disk, aka > instance store, and not EBS? > > > On Wed, Jan 15, 2014 at 3:26 PM, Andrew Purtell > wrote: > > > m1.xlarge is a poorly provisioned instance type, with low PPS at the > > network layer. Can you try a type advertised to have "high" I/O > > performance? > > > > > > On Wed, Jan 15, 2014 at 12:33 PM, Vladimir Rodionov < > > vrodionov@carrieriq.com> wrote: > > > >> This is something which needs to be definitely solved/fixed/resolved > >> > >> I am running YCSB benchmark on aws ec2 on a small HBase cluster > >> > >> 5 (m1.xlarge) as RS > >> 1 (m1.xlarge) hbase-master, zookeper > >> > >> Whirr 0.8.2 (with many hacks) is used to provision HBase. > >> > >> I am running 1 ycsb client (100% insert ops) throttled at 5K ops: > >> > >> ./bin/ycsb load hbase -P workloads/load20m -p columnfamily=family -s > >> -threads 10 -target 5000 > >> > >> OUTPUT: > >> > >> 1120 sec: 5602339 operations; 4999.7 current ops/sec; [INSERT > >> AverageLatency(us)=225.53] > >> 1130 sec: 5652117 operations; 4969.35 current ops/sec; [INSERT > >> AverageLatency(us)=203.31] > >> 1140 sec: 5665210 operations; 1309.04 current ops/sec; [INSERT > >> AverageLatency(us)=17.13] > >> 1150 sec: 5665210 operations; 0 current ops/sec; > >> 1160 sec: 5665210 operations; 0 current ops/sec; > >> 1170 sec: 5665210 operations; 0 current ops/sec; > >> 1180 sec: 5665210 operations; 0 current ops/sec; > >> 1190 sec: 5665210 operations; 0 current ops/sec; > >> 2014-01-15 15:19:34,139 Thread-2 WARN > >> [HConnectionManager$HConnectionImplementation] Failed all from > >> > region=usertable,user6039,1389811852201.40518862106856d23b883e5d543d0b89., > >> hostname=ip-10-45-174-120.ec2.internal, port=60020 > >> java.util.concurrent.ExecutionException: > java.net.SocketTimeoutException: > >> Call to ip-10-45-174-120.ec2.internal/10.45.174.120:60020 failed on > >> socket timeout exception: java.net.SocketTimeoutException: 60000 millis > >> timeout while waiting for channel to be ready for read. ch : > >> java.nio.channels.SocketChannel[connected local=/10.180.211.173:42466 > remote=ip-10-45-174-120.ec2.internal/ > >> 10.45.174.120:60020] > >> at > >> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) > >> at java.util.concurrent.FutureTask.get(FutureTask.java:111) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1708) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1560) > >> at > >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:994) > >> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:850) > >> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:826) > >> at com.yahoo.ycsb.db.HBaseClient.update(HBaseClient.java:328) > >> at com.yahoo.ycsb.db.HBaseClient.insert(HBaseClient.java:357) > >> at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148) > >> at > >> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461) > >> at com.yahoo.ycsb.ClientThread.run(Client.java:269) > >> Caused by: java.net.SocketTimeoutException: Call to > >> ip-10-45-174-120.ec2.internal/10.45.174.120:60020 failed on socket > >> timeout exception: java.net.SocketTimeoutException: 60000 millis timeout > >> while waiting for channel to be ready for read. ch : > >> java.nio.channels.SocketChannel[connected local=/10.180.211.173:42466 > remote=ip-10-45-174-120.ec2.internal/ > >> 10.45.174.120:60020] > >> at > >> > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043) > >> at > >> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016) > >> at > >> > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87) > >> at com.sun.proxy.$Proxy5.multi(Unknown Source) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535) > >> at > >> > org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532) > >> at > >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > >> at java.util.concurrent.FutureTask.run(FutureTask.java:166) > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> at java.lang.Thread.run(Thread.java:701) > >> > >> > >> SKIPPED A LOT > >> > >> > >> 1200 sec: 5674180 operations; 896.82 current ops/sec; [INSERT > >> AverageLatency(us)=7506.37] > >> 1210 sec: 6022326 operations; 34811.12 current ops/sec; [INSERT > >> AverageLatency(us)=1998.26] > >> 1220 sec: 6102627 operations; 8018.07 current ops/sec; [INSERT > >> AverageLatency(us)=395.11] > >> 1230 sec: 6152632 operations; 5000 current ops/sec; [INSERT > >> AverageLatency(us)=182.53] > >> 1240 sec: 6202641 operations; 4999.9 current ops/sec; [INSERT > >> AverageLatency(us)=201.76] > >> 1250 sec: 6252642 operations; 4999.6 current ops/sec; [INSERT > >> AverageLatency(us)=190.46] > >> 1260 sec: 6302653 operations; 5000.1 current ops/sec; [INSERT > >> AverageLatency(us)=212.31] > >> 1270 sec: 6352660 operations; 5000.2 current ops/sec; [INSERT > >> AverageLatency(us)=217.77] > >> 1280 sec: 6402731 operations; 5000.1 current ops/sec; [INSERT > >> AverageLatency(us)=195.83] > >> 1290 sec: 6452740 operations; 4999.9 current ops/sec; [INSERT > >> AverageLatency(us)=232.43] > >> 1300 sec: 6502743 operations; 4999.8 current ops/sec; [INSERT > >> AverageLatency(us)=290.52] > >> 1310 sec: 6552755 operations; 5000.2 current ops/sec; [INSERT > >> AverageLatency(us)=259.49] > >> > >> > >> As you can see here there is ~ 60 sec total write stall on a cluster > >> which I suppose 100% correlates with compactions started (minor) > >> > >> MAX_FILESIZE = 5GB > >> ## Regions of 'usertable' - 50 > >> > >> I would appreciate any advices on how to get rid of these stalls. 5K per > >> sec is quite moderate load even for 5 lousy AWS servers. Or it is not? > >> > >> Best regards, > >> Vladimir Rodionov > >> Principal Platform Engineer > >> Carrier IQ, www.carrieriq.com > >> e-mail: vrodionov@carrieriq.com > >> > >> > >> Confidentiality Notice: The information contained in this message, > >> including any attachments hereto, may be confidential and is intended > to be > >> read only by the individual or entity to whom this message is > addressed. If > >> the reader of this message is not the intended recipient or an agent or > >> designee of the intended recipient, please note that any review, use, > >> disclosure or distribution of this message or its attachments, in any > form, > >> is strictly prohibited. If you have received this message in error, > please > >> immediately notify the sender and/or Notifications@carrieriq.com and > >> delete or destroy any copy of this message and its attachments. > >> > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > --f46d04428f02852c0304f00c6610--