incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-12) Investigate communication improvements
Date Wed, 28 Sep 2011 05:17:46 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116166#comment-13116166
] 

Hyunsik Choi commented on GIRAPH-12:
------------------------------------

Avery,
Thank you for your comments. I decided to use Runtime. It seems to be enough to investigate
this issue.

Again, I conducted a benchmark to measure memory consumption with RandomMessageBenchmark as
follows:

{noformat}
hadoop jar giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.RandomMessageBenchmark
-e 2 -s 3 -w 20 -b 4 -n 150 -V ${V} -v -f ${f}
{noformat}
, where 'f' option indicates the number of threads of thread pool. And, I changed the the
thread executor as FixedThreadPool class.

I conducted two times for every experiment and I got the average of them. You can see the
results from the below link:
http://goo.gl/arP62

This experiments was conducted in two cluaster nodes, each of which has 24 cores and 64GB
mem. They are connected each other over 1Gbps ethernet. I measured the memory footprints from
Runtime in GraphMapper as Avery recommended.

In sum, the thread pool approach is better than original approach in terms of processing times.
I guess that this is because the thread pool approach reduces the context switching cost and
narrow the synchronization area.

Unfortunately, however, the thread pool approach doesn't reduce the memory consumption. This
is the main focus of this issue. Rather, this approach needs slightly more memory as shown
in Figure 3 and 4. However, we need to note the experiments with f = 5 and f = 20. In these
experiments, the number of threads has small effect on the memory consumption.

We have faced the memory problem. We may need to approach this problem from another aspect.
I think that this problem may be mainly caused by the current message flushing strategy.

In current implementation, outgoing messages are transmitted to other peers by only two cases:
1) When the number of outgoing messages for a specific peer exceeds the a threshold (i.e.,
maxSize), the outgoing messages for the peer are transmitted to the peer.
2) When one super step is finished, the entire messages are flushed to other peers.

Flush (case 2) is only triggered at the end of superstep. During processing, the message flushing
only depends on the case 1. This may be not effective because the case 1 only consider the
the number of messages for each specific peer. It never take account of the real memory occupation.
If destinations of outgoing messages are uniform, out of memory may occur before any 'case
1' is triggered.

To overcome this problem, we may need more eager message flushing strategy or some approach
to store overflow messages into disk.

Let me know what you think.
                
> Investigate communication improvements
> --------------------------------------
>
>                 Key: GIRAPH-12
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-12
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>            Reporter: Avery Ching
>            Assignee: Hyunsik Choi
>            Priority: Minor
>         Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other workers.
 Hadoop RPC is used for communication.  For instance if there are 400 workers, each worker
will create 400 threads.  This ends up using a lot of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll our own to
improve this situation.  By moving away from Hadoop RPC, we would also make compatibility
of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message