incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-12) Investigate communication improvements
Date Tue, 13 Sep 2011 07:36:08 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103422#comment-13103422
] 

Avery Ching commented on GIRAPH-12:
-----------------------------------

Hyunsik, just to update, I grabbed your patch and it passed unittest on my machine.  Then
I ran it on a cluster at Yahoo!.  

I didn't have time to make a messaging benchmark, so I ran PageRankBenchmark.  I ran with
100 workers, 1 M vertices, 3 supersteps, and 10 edges per vertex.

Here are 2 runs with the original code:

11/09/13 07:02:08 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:02:08 INFO mapred.JobClient:     Total (milliseconds)=46709
11/09/13 07:02:08 INFO mapred.JobClient:     Superstep 3 (milliseconds)=1682
11/09/13 07:02:08 INFO mapred.JobClient:     Setup (milliseconds)=3228
11/09/13 07:02:08 INFO mapred.JobClient:     Shutdown (milliseconds)=1223
11/09/13 07:02:08 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=3578
11/09/13 07:02:08 INFO mapred.JobClient:     Superstep 0 (milliseconds)=16222
11/09/13 07:02:08 INFO mapred.JobClient:     Superstep 2 (milliseconds)=12302
11/09/13 07:02:08 INFO mapred.JobClient:     Superstep 1 (milliseconds)=8467

13 07:14:51 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:14:51 INFO mapred.JobClient:     Total (milliseconds)=51475
11/09/13 07:14:51 INFO mapred.JobClient:     Superstep 3 (milliseconds)=1348
11/09/13 07:14:51 INFO mapred.JobClient:     Setup (milliseconds)=7233
11/09/13 07:14:51 INFO mapred.JobClient:     Shutdown (milliseconds)=884
11/09/13 07:14:51 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=3284
11/09/13 07:14:51 INFO mapred.JobClient:     Superstep 0 (milliseconds)=22213
11/09/13 07:14:51 INFO mapred.JobClient:     Superstep 2 (milliseconds)=8553
11/09/13 07:14:51 INFO mapred.JobClient:     Superstep 1 (milliseconds)=7955


Here are 2 runs with your code:

11/09/13 07:06:56 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:06:56 INFO mapred.JobClient:     Total (milliseconds)=51935
11/09/13 07:06:56 INFO mapred.JobClient:     Superstep 3 (milliseconds)=1150
11/09/13 07:06:56 INFO mapred.JobClient:     Setup (milliseconds)=3338
11/09/13 07:06:56 INFO mapred.JobClient:     Shutdown (milliseconds)=833
11/09/13 07:06:56 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=3401
11/09/13 07:06:56 INFO mapred.JobClient:     Superstep 0 (milliseconds)=17297
11/09/13 07:06:56 INFO mapred.JobClient:     Superstep 2 (milliseconds)=14384
11/09/13 07:06:56 INFO mapred.JobClient:     Superstep 1 (milliseconds)=11528

11/09/13 07:12:09 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:12:09 INFO mapred.JobClient:     Total (milliseconds)=51985
11/09/13 07:12:09 INFO mapred.JobClient:     Superstep 3 (milliseconds)=1362
11/09/13 07:12:09 INFO mapred.JobClient:     Setup (milliseconds)=3776
11/09/13 07:12:09 INFO mapred.JobClient:     Shutdown (milliseconds)=710
11/09/13 07:12:09 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=3771
11/09/13 07:12:09 INFO mapred.JobClient:     Superstep 0 (milliseconds)=17741
11/09/13 07:12:09 INFO mapred.JobClient:     Superstep 2 (milliseconds)=13068
11/09/13 07:12:09 INFO mapred.JobClient:     Superstep 1 (milliseconds)=11551

In my limited testing, numbers aren't too different.  I also see that the connections are
maintained throughout the application run as you mentioned.  So the only tradeoff is possibly
the reduced parallelization of message sending (user chosen vs all threads).  I like the approach
and think it's an improvement (controllable threads).  Perhaps the only comment is that regarding
the following code block.

for(PeerConnection pc : peerConnections.values()) {
    futures.add(executor.submit(new PeerFlushExecutor(pc)));    	
}

Probably would be good to randomize the PeerConnection objects to avoid hotspots on the receiving
side?


> Investigate communication improvements
> --------------------------------------
>
>                 Key: GIRAPH-12
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-12
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>            Reporter: Avery Ching
>            Assignee: Hyunsik Choi
>            Priority: Minor
>         Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other workers.
 Hadoop RPC is used for communication.  For instance if there are 400 workers, each worker
will create 400 threads.  This ends up using a lot of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll our own to
improve this situation.  By moving away from Hadoop RPC, we would also make compatibility
of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message