Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EB5BD0DD for ; Wed, 17 Oct 2012 04:40:05 +0000 (UTC) Received: (qmail 10809 invoked by uid 500); 17 Oct 2012 04:40:05 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 10759 invoked by uid 500); 17 Oct 2012 04:40:05 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 10728 invoked by uid 500); 17 Oct 2012 04:40:04 -0000 Delivered-To: apmail-incubator-giraph-dev@incubator.apache.org Received: (qmail 10714 invoked by uid 99); 17 Oct 2012 04:40:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 04:40:03 +0000 Date: Wed, 17 Oct 2012 04:40:03 +0000 (UTC) From: "Hudson (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: <1423454109.55669.1350448803989.JavaMail.jiratomcat@arcas> In-Reply-To: <714683206.49223.1350351303113.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (GIRAPH-374) Multithreading in input split loading and compute MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GIRAPH-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477591#comment-13477591 ] Hudson commented on GIRAPH-374: ------------------------------- Integrated in Giraph-trunk-Commit #244 (See [https://builds.apache.org/job/Giraph-trunk-Commit/244/]) GIRAPH-374: Multithreading in input split loading and compute (aching). (Revision 1399090) Result = SUCCESS aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1399090 Files : * /giraph/trunk/CHANGELOG * /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hbase/HBaseVertexInputFormat.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedService.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendMessageCache.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendPartitionCache.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerClient.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerClientRequestProcessor.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerServer.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/ChannelRotater.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyClient.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyServer.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientServer.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/handler/AddressRequestIdGenerator.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/AggregatorWrapper.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/ComputeCallable.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GraphMapper.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GraphState.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/InputSplitsCallable.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/MutableVertex.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/SimpleMutableVertex.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Vertex.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/LoggerUtils.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ProgressableUtils.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/Time.java * /giraph/trunk/giraph/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java * /giraph/trunk/giraph/src/test/java/org/apache/giraph/BspCase.java * /giraph/trunk/giraph/src/test/java/org/apache/giraph/TestBspBasic.java * /giraph/trunk/giraph/src/test/java/org/apache/giraph/TestPageRank.java * /giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java > Multithreading in input split loading and compute > ------------------------------------------------- > > Key: GIRAPH-374 > URL: https://issues.apache.org/jira/browse/GIRAPH-374 > Project: Giraph > Issue Type: Improvement > Reporter: Avery Ching > Assignee: Avery Ching > Attachments: GIRAPH-374.2.patch > > > Cleaned up the WorkerClient hierarchy > - WorkerClientRequestProcessor is a request cache for every thread (input split loading / compute) > - With RPC gone, got rid of ugly WorkerClientServer and NettyWorkerClientServer > SendPartitionCache > Made GraphState immutable for multi-threading > Added multithreading for loading the input splits > Added multithreading for compute > Added thread-level debugging as an option > Added additional testing on the number of vertices, edges > Optimization on HashWorkerPartitioner to use CopyOnWriteArrayList instead of sychronized list (this is a bottleneck) > Added multithreaded TestPageRank test case > I ran the PageRankBenchmark on 20 workers with 10M vertices, 1B edges. All supersteps are about the same time, so I just compared superstep 0 from every test. Compute performance gains are quite nice (even a little faster than before with one thread). Actual gains will depend heavily on the number of cores you have and possible parallelism of the application. > {code} > Trunk > # threads compute time (secs) total time (secs) > 1 89 97.543 > Multithreading > 1 86.70094 92.477 > 2 50.41521 57.850 > 4 38.07716 50.246 > 8 38.63188 45.940 > 16 22.999943 48.607 > 24 23.649189 45.112 > 32 21.412325 44.201 > {code} > We also saw similar gains on the input split loading on an internal app. Future work can be to further improve the scalability of multithreading. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira