giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman Shaposhnik (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-800) Resolving mutations on a large graph causes timeouts
Date Fri, 06 Jun 2014 22:10:03 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Roman Shaposhnik updated GIRAPH-800:
------------------------------------

    Fix Version/s:     (was: 1.1.0)

> Resolving mutations on a large graph causes timeouts
> ----------------------------------------------------
>
>                 Key: GIRAPH-800
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-800
>             Project: Giraph
>          Issue Type: Bug
>          Components: graph
>    Affects Versions: 1.1.0
>         Environment: hadoop1
>            Reporter: Craig Muchinsky
>         Attachments: GIRAPH-800.patch
>
>
> When processing a graph with a large number of mutations and/or a large number of messages
per superstep, the pre-superstep logic can appear to be hung up and eventually the graph times
out either because of mapreduce task inactivity or hitting the max superstep wait.
> While its possible to tune around this by adding a strategic call to context.progress()
in NettyServerWorker.resolveMutations() and bumping up the giraph.maxMasterSuperstepWaitMsecs
setting, it would seem this part of the code might need some optimization.
> As an example, in a graph with 2B vertices and 2.5B edges the transition between supersteps
with 1B messages in flight can take 15-30 minutes on a cluster with 228 workers (2 threads,
8GB RAM per worker).
> While the vertex resolve processing can be time consuming, I believe its the check for
missing vertices (second loop within NettyServerWorker.resolveMutations()) that is the real
performance bottleneck. I haven't identified a fix to this logic as of yet, but I did identify
a possible workaround. I believe when dealing with a static and complete graph the resolveMutations()
call can be skipped all together. A quick test of this theory yielded a 3x performance improvement
in my sandbox.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message