Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AB25B200B40 for ; Fri, 1 Jul 2016 17:54:13 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A9CEE160A72; Fri, 1 Jul 2016 15:54:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D58A5160A61 for ; Fri, 1 Jul 2016 17:54:12 +0200 (CEST) Received: (qmail 33166 invoked by uid 500); 1 Jul 2016 15:54:11 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 33121 invoked by uid 99); 1 Jul 2016 15:54:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2016 15:54:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8AB712C02A0 for ; Fri, 1 Jul 2016 15:54:11 +0000 (UTC) Date: Fri, 1 Jul 2016 15:54:11 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14479) Apply the Leader/Followers pattern to RpcServer's Reader MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 01 Jul 2016 15:54:13 -0000 [ https://issues.apache.org/jira/browse/HBASE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359168#comment-15359168 ] stack commented on HBASE-14479: ------------------------------- I tried this again with total random read workload (all from cache). Readers are here at safe point: {code} 2449 "RpcServer.reader=0,bindAddress=ve0528.halxg.cloudera.com,port=16020" #34 daemon prio=5 os_prio=0 tid=0x00007fb669c7f1e0 nid=0x1c7e8 waiting on condition [0x00007fae4d244000] 2450 java.lang.Thread.State: WAITING (parking) 2451 at sun.misc.Unsafe.park(Native Method) 2452 - parking to wait for <0x00007faf661d4c00> (a java.util.concurrent.Semaphore$NonfairSync) 2453 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 2454 at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) 2455 at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) 2456 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) 2457 at java.util.concurrent.Semaphore.acquire(Semaphore.java:312) 2458 at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:688) 2459 at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:669) 2460 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2461 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2462 at java.lang.Thread.run(Thread.java:745) {code} ...i.e. at the new semaphore. Throughput is way down... 150k ops/s vs 380k ops/s. Looking w/ honest profiler, the call stack is way different w/ current branch-1 spending most of its time responding: Current branch-1 {code} 6 Tree Profile: 7 (t 100.0,s 5.2) org.apache.hadoop.hbase.ipc.RpcServer$Responder::run 8 (t 94.8,s 0.0) org.apache.hadoop.hbase.ipc.RpcServer$Responder::doRunLoop 9 (t 81.0,s 0.6) org.apache.hadoop.hbase.ipc.RpcServer$Responder::doAsyncWrite 10 (t 79.9,s 1.1) org.apache.hadoop.hbase.ipc.RpcServer$Responder::processAllResponses 11 (t 76.4,s 0.6) org.apache.hadoop.hbase.ipc.RpcServer$Responder::processResponse 12 (t 75.9,s 0.0) org.apache.hadoop.hbase.ipc.RpcServer::channelWrite 13 (t 73.6,s 0.0) org.apache.hadoop.hbase.ipc.BufferChain::write 14 (t 72.4,s 2.3) sun.nio.ch.SocketChannelImpl::write 15 (t 67.8,s 0.6) sun.nio.ch.IOUtil::write 16 (t 62.1,s 0.0) sun.nio.ch.SocketDispatcher::writev 17 (t 62.1,s 62.1) sun.nio.ch.FileDispatcherImpl::writev0 18 (t 2.3,s 0.6) sun.nio.ch.Util::getTemporaryDirectBuffer 19 (t 1.7,s 0.0) java.lang.ThreadLocal::get 20 (t 1.7,s 0.0) java.lang.ThreadLocal$ThreadLocalMap::access$000 21 (t 1.7,s 1.7) java.lang.ThreadLocal$ThreadLocalMap::getEntry 22 (t 0.6,s 0.0) sun.nio.ch.IOVecWrapper::get 23 (t 0.6,s 0.0) java.lang.ThreadLocal::get 24 (t 0.6,s 0.0) java.lang.ThreadLocal$ThreadLocalMap::access$000 25 (t 0.6,s 0.6) java.lang.ThreadLocal$ThreadLocalMap::getEntry 26 (t 0.6,s 0.6) sun.nio.ch.Util::offerLastTemporaryDirectBuffer 27 (t 0.6,s 0.0) java.nio.DirectByteBuffer::put 28 (t 0.6,s 0.6) java.nio.Buffer::limit 29 (t 0.6,s 0.6) java.nio.Buffer::position 30 (t 0.6,s 0.0) sun.nio.ch.IOVecWrapper::putLen 31 (t 0.6,s 0.6) sun.nio.ch.NativeObject::putLong 32 (t 1.1,s 0.0) java.nio.channels.spi.AbstractInterruptibleChannel::begin 33 (t 1.1,s 0.0) java.nio.channels.spi.AbstractInterruptibleChannel::blockedOn 34 (t 1.1,s 0.0) java.lang.System$2::blockedOn 35 (t 1.1,s 1.1) java.lang.Thread::blockedOn 36 (t 1.1,s 1.1) sun.nio.ch.SocketChannelImpl::writerCleanup 37 (t 1.1,s 1.1) java.nio.Buffer::hasRemaining ... {code} With patch: {code} Tree Profile: (t 100.0,s 2.2) java.lang.Thread::run (t 97.8,s 0.0) java.util.concurrent.ThreadPoolExecutor$Worker::run (t 97.8,s 0.0) java.util.concurrent.ThreadPoolExecutor::runWorker (t 97.8,s 0.1) org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader::run (t 97.7,s 0.2) org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader::doRunLoop (t 63.9,s 0.9) org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader::leading (t 59.1,s 0.0) sun.nio.ch.SelectorImpl::select (t 59.1,s 0.0) sun.nio.ch.SelectorImpl::select (t 59.1,s 0.0) sun.nio.ch.SelectorImpl::lockAndDoSelect (t 59.1,s 0.1) sun.nio.ch.EPollSelectorImpl::doSelect (t 49.2,s 0.0) sun.nio.ch.EPollArrayWrapper::poll (t 43.2,s 0.9) sun.nio.ch.EPollArrayWrapper::updateRegistrations (t 42.0,s 42.0) sun.nio.ch.EPollArrayWrapper::epollCtl (t 0.4,s 0.2) java.util.BitSet::get (t 0.1,s 0.1) java.util.BitSet::wordIndex (t 6.0,s 6.0) sun.nio.ch.EPollArrayWrapper::epollWait (t 9.1,s 1.2) sun.nio.ch.EPollSelectorImpl::updateSelectedKeys (t 5.3,s 0.0) java.util.HashMap::get (t 5.3,s 3.9) java.util.HashMap::getNode (t 1.3,s 1.3) java.lang.Integer::equals (t 1.0,s 0.0) sun.nio.ch.SocketChannelImpl::translateAndSetReadyOps (t 1.0,s 1.0) sun.nio.ch.SocketChannelImpl::translateReadyOps (t 0.9,s 0.0) java.util.HashSet::add (t 0.9,s 0.0) java.util.HashMap::put (t 0.9,s 0.5) java.util.HashMap::putVal (t 0.4,s 0.4) java.util.HashMap::newNode (t 0.6,s 0.0) java.util.HashSet::contains (t 0.6,s 0.0) java.util.HashMap::containsKey (t 0.5,s 0.5) java.util.HashMap::getNode (t 0.1,s 0.1) java.util.HashMap::hash (t 0.1,s 0.1) sun.nio.ch.EPollArrayWrapper::getDescriptor (t 0.6,s 0.6) sun.nio.ch.IOUtil::drain (t 0.1,s 0.0) java.nio.channels.spi.AbstractSelector::end (t 0.1,s 0.1) java.nio.channels.spi.AbstractInterruptibleChannel::blockedOn (t 1.5,s 0.0) sun.nio.ch.SelectionKeyImpl::interestOps (t 1.5,s 0.6) sun.nio.ch.SelectionKeyImpl::nioInterestOps (t 0.9,s 0.0) sun.nio.ch.SocketChannelImpl::translateAndSetInterestOps (t 0.9,s 0.0) sun.nio.ch.EPollSelectorImpl::putEventOps (t 0.9,s 0.9) sun.nio.ch.EPollArrayWrapper::setInterest (t 1.2,s 0.0) java.util.concurrent.ConcurrentLinkedQueue::add (t 1.2,s 0.9) java.util.concurrent.ConcurrentLinkedQueue::offer (t 0.2,s 0.2) java.util.concurrent.ConcurrentLinkedQueue$Node::casNext (t 0.1,s 0.1) java.util.concurrent.ConcurrentLinkedQueue$Node:: (t 0.7,s 0.1) java.util.HashMap$KeyIterator::next ... {code} > Apply the Leader/Followers pattern to RpcServer's Reader > -------------------------------------------------------- > > Key: HBASE-14479 > URL: https://issues.apache.org/jira/browse/HBASE-14479 > Project: HBase > Issue Type: Improvement > Components: IPC/RPC, Performance > Reporter: Hiroshi Ikeda > Assignee: Hiroshi Ikeda > Priority: Minor > Attachments: HBASE-14479-V2 (1).patch, HBASE-14479-V2.patch, HBASE-14479-V2.patch, HBASE-14479.patch, flamegraph-19152.svg, flamegraph-32667.svg, gc.png, gets.png, io.png, median.png > > > {{RpcServer}} uses multiple selectors to read data for load distribution, but the distribution is just done by round-robin. It is uncertain, especially for long run, whether load is equally divided and resources are used without being wasted. > Moreover, multiple selectors may cause excessive context switches which give priority to low latency (while we just add the requests to queues), and it is possible to reduce throughput of the whole server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)