hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: HBase 0.94.15: writes stalls periodically even under moderate steady load (AWS EC2)
Date Wed, 15 Jan 2014 22:14:19 GMT
Hi Vladimir,

5k ops/sec over 5 hosts should be entirely reasonable, though EC2 can be a
fickle friend for HBase. Can you run some other task on the host that also
writes data and monitor disk activity with iostat or iotop? Basically, find
some indication if it's the virtual host's IO that's shutting down or if
it's something in HBase itself. If disk IO is looking good, I think I'd
move up the stack and monitor network activity -- are the RegionServers
receiving the write requests?

-n


On Wed, Jan 15, 2014 at 12:33 PM, Vladimir Rodionov <vrodionov@carrieriq.com
> wrote:

> This is something which needs to be definitely solved/fixed/resolved
>
> I am running YCSB benchmark on aws ec2 on a small HBase cluster
>
> 5 (m1.xlarge) as RS
> 1 (m1.xlarge) hbase-master, zookeper
>
> Whirr 0.8.2 (with many hacks) is used to provision HBase.
>
> I am running 1 ycsb client (100% insert ops) throttled at 5K ops:
>
> ./bin/ycsb load hbase -P workloads/load20m -p columnfamily=family -s
> -threads 10 -target 5000
>
> OUTPUT:
>
> 1120 sec: 5602339 operations; 4999.7 current ops/sec; [INSERT
> AverageLatency(us)=225.53]
>  1130 sec: 5652117 operations; 4969.35 current ops/sec; [INSERT
> AverageLatency(us)=203.31]
>  1140 sec: 5665210 operations; 1309.04 current ops/sec; [INSERT
> AverageLatency(us)=17.13]
>  1150 sec: 5665210 operations; 0 current ops/sec;
>  1160 sec: 5665210 operations; 0 current ops/sec;
>  1170 sec: 5665210 operations; 0 current ops/sec;
>  1180 sec: 5665210 operations; 0 current ops/sec;
>  1190 sec: 5665210 operations; 0 current ops/sec;
> 2014-01-15 15:19:34,139 Thread-2 WARN
>  [HConnectionManager$HConnectionImplementation] Failed all from
> region=usertable,user6039,1389811852201.40518862106856d23b883e5d543d0b89.,
> hostname=ip-10-45-174-120.ec2.internal, port=60020
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to ip-10-45-174-120.ec2.internal/10.45.174.120:60020 failed on
> socket timeout exception: java.net.SocketTimeoutException: 60000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.180.211.173:42466remote=ip-10-45-174-120.ec2.internal/
> 10.45.174.120:60020]
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1708)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1560)
>         at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:994)
>         at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:850)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:826)
>         at com.yahoo.ycsb.db.HBaseClient.update(HBaseClient.java:328)
>         at com.yahoo.ycsb.db.HBaseClient.insert(HBaseClient.java:357)
>         at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)
>         at
> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)
>         at com.yahoo.ycsb.ClientThread.run(Client.java:269)
> Caused by: java.net.SocketTimeoutException: Call to
> ip-10-45-174-120.ec2.internal/10.45.174.120:60020 failed on socket
> timeout exception: java.net.SocketTimeoutException: 60000 millis timeout
> while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.180.211.173:42466remote=ip-10-45-174-120.ec2.internal/
> 10.45.174.120:60020]
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016)
>         at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
>         at com.sun.proxy.$Proxy5.multi(Unknown Source)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535)
>         at
> org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:701)
>
>
> SKIPPED A LOT
>
>
>  1200 sec: 5674180 operations; 896.82 current ops/sec; [INSERT
> AverageLatency(us)=7506.37]
>  1210 sec: 6022326 operations; 34811.12 current ops/sec; [INSERT
> AverageLatency(us)=1998.26]
>  1220 sec: 6102627 operations; 8018.07 current ops/sec; [INSERT
> AverageLatency(us)=395.11]
>  1230 sec: 6152632 operations; 5000 current ops/sec; [INSERT
> AverageLatency(us)=182.53]
>  1240 sec: 6202641 operations; 4999.9 current ops/sec; [INSERT
> AverageLatency(us)=201.76]
>  1250 sec: 6252642 operations; 4999.6 current ops/sec; [INSERT
> AverageLatency(us)=190.46]
>  1260 sec: 6302653 operations; 5000.1 current ops/sec; [INSERT
> AverageLatency(us)=212.31]
>  1270 sec: 6352660 operations; 5000.2 current ops/sec; [INSERT
> AverageLatency(us)=217.77]
>  1280 sec: 6402731 operations; 5000.1 current ops/sec; [INSERT
> AverageLatency(us)=195.83]
>  1290 sec: 6452740 operations; 4999.9 current ops/sec; [INSERT
> AverageLatency(us)=232.43]
>  1300 sec: 6502743 operations; 4999.8 current ops/sec; [INSERT
> AverageLatency(us)=290.52]
>  1310 sec: 6552755 operations; 5000.2 current ops/sec; [INSERT
> AverageLatency(us)=259.49]
>
>
> As you can see here there is ~ 60 sec total write stall on a cluster which
> I suppose 100% correlates with compactions started (minor)
>
> MAX_FILESIZE = 5GB
> ## Regions of 'usertable' - 50
>
> I would appreciate any advices on how to get rid of these stalls. 5K per
> sec is quite moderate load even for 5 lousy AWS servers. Or it is not?
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message