giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig Muchinsky (JIRA)" <>
Subject [jira] [Created] (GIRAPH-800) Resolving mutations on a large graph causes timeouts
Date Thu, 21 Nov 2013 17:44:35 GMT
Craig Muchinsky created GIRAPH-800:

             Summary: Resolving mutations on a large graph causes timeouts
                 Key: GIRAPH-800
             Project: Giraph
          Issue Type: Bug
          Components: graph
    Affects Versions: 1.1.0
         Environment: hadoop1
            Reporter: Craig Muchinsky

When processing a graph with a large number of mutations and/or a large number of messages
per superstep, the pre-superstep logic can appear to be hung up and eventually the graph times
out either because of mapreduce task inactivity or hitting the max superstep wait.

While its possible to tune around this by adding a strategic call to context.progress() in
NettyServerWorker.resolveMutations() and bumping up the giraph.maxMasterSuperstepWaitMsecs
setting, it would seem this part of the code might need some optimization.

As an example, in a graph with 2B vertices and 2.5B edges the transition between supersteps
with 1B messages in flight can take 15-30 minutes on a cluster with 228 workers (2 threads,
8GB RAM per worker).

While the vertex resolve processing can be time consuming, I believe its the check for missing
vertices (second loop within NettyServerWorker.resolveMutations()) that is the real performance
bottleneck. I haven't identified a fix to this logic as of yet, but I did identify a possible
workaround. I believe when dealing with a static and complete graph the resolveMutations()
call can be skipped all together. A quick test of this theory yielded a 3x performance improvement
in my sandbox.

This message was sent by Atlassian JIRA

View raw message