giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching" <avery.ch...@gmail.com>
Subject Re: Review Request: GIRAPH-374 Multithreading in input split loading and compute
Date Wed, 17 Oct 2012 03:46:16 GMT


> On Oct. 17, 2012, 1:49 a.m., Maja Kabiljo wrote:
> > Wow, Avery, this is a huge patch! :-)
> > 
> > Looks mostly good to me. I just didn't completely go through the parts which are
supposed to be copied from a place to another, like from NettyWorkerClient to NettyWorkerClientRequestProcessor,
BspServiceWorker to InputSplitsCallable, etc.
> > 
> > I'm wondering does this work correctly with out-of-core messages and graph. For
graph at least it seems all partitions are read prior to computation. Could you have a blocking
list of partition ids and then read them on the fly?
> > 
> > Have you verified that all the counters work correctly now, I believe we have no
tests for that?
> > 
> > You have the change from GIRAPH-357 in there, maybe that's the reason of small speed-up...
> > 
> > Anyway, great work, and the results seem good!

Thanks for the quick review Maja.

>>I'm wondering does this work correctly with out-of-core messages and graph. For graph
at least it seems all partitions are read prior to computation. Could you have a blocking
list of partition ids and then read them on the fly?

Thanks for noticing this!  It would have caused an issue for the out-of-core graph to load
the partitions early. 

The counters appear to be working fine as they are aggregated by the master.


> On Oct. 17, 2012, 1:49 a.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java,
line 686
> > <https://reviews.apache.org/r/7613/diff/1/?file=177070#file177070line686>
> >
> >     Why is this returning boolean now? (set to true and not used)

Agreed!  Whoops.  Changed it.


> On Oct. 17, 2012, 1:49 a.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java,
line 194
> > <https://reviews.apache.org/r/7613/diff/1/?file=177063#file177063line194>
> >
> >     Should this part be thread-safe?

This method (resolveAddress) should be thread-safe.  But to be certain, I made it private
static.


- Avery


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7613/#review12498
-----------------------------------------------------------


On Oct. 17, 2012, 3:45 a.m., Avery Ching wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7613/
> -----------------------------------------------------------
> 
> (Updated Oct. 17, 2012, 3:45 a.m.)
> 
> 
> Review request for giraph and Maja Kabiljo.
> 
> 
> Description
> -------
> 
> Cleaned up the WorkerClient hierarchy
> - WorkerClientRequestProcessor is a request cache for every thread (input split loading
/ compute)
> - With RPC gone, got rid of ugly WorkerClientServer and NettyWorkerClientServer
> SendPartitionCache
> Made GraphState immutable for multi-threading
> Added multithreading for loading the input splits
> Added multithreading for compute
> Added thread-level debugging as an option
> Added additional testing on the number of vertices, edges
> Optimization on HashWorkerPartitioner to use CopyOnWriteArrayList instead of sychronized
list (this is a bottleneck)
> Added multithreaded TestPageRank test case
> 
> I ran the PageRankBenchmark on 20 workers with 10M vertices, 1B edges. All supersteps
are about the same time, so I just compared superstep 0 from every test. Compute performance
gains are quite nice (even a little faster than before with one thread). Actual gains will
depend heavily on the number of cores you have and possible parallelism of the application.
> 
> Trunk
> # threads  compute time (secs)   total time (secs)
> 1          89                    97.543
> 
> Multithreading
> 1          86.70094              92.477
> 2          50.41521              57.850
> 4          38.07716              50.246
> 8          38.63188              45.940
> 16         22.999943             48.607
> 24         23.649189             45.112
> 32         21.412325             44.201
> 
> We also saw similar gains on the input split loading on an internal app. Future work
can be to further improve the scalability of multithreading.
> 
> 
> This addresses bug GIRAPH-374.
>     https://issues.apache.org/jira/browse/GIRAPH-374
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hbase/HBaseVertexInputFormat.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedService.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendMessageCache.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendPartitionCache.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerClient.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerClientRequestProcessor.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerServer.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/ChannelRotater.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyClient.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyServer.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientServer.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/handler/AddressRequestIdGenerator.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/AggregatorWrapper.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/ComputeCallable.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GraphMapper.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GraphState.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/InputSplitsCallable.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/MutableVertex.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/SimpleMutableVertex.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Vertex.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/LoggerUtils.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ProgressableUtils.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/Time.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/BspCase.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestBspBasic.java
1399043 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestPageRank.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java
1399043 
> 
> Diff: https://reviews.apache.org/r/7613/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> pseudo-distributed unittests
> Running on internal FB apps as well.
> 
> 
> Thanks,
> 
> Avery Ching
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message