giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (GIRAPH-426) Timeout during edge exchange for big datasets
Date Fri, 16 Nov 2012 23:37:12 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alessandro Presta resolved GIRAPH-426.
--------------------------------------

    Resolution: Not A Problem

Turns out it's not a progress reporting issue: we already call progress() after each resolved
vertex, and that should be enough in standard use cases.
The problem is my dataset had so-called "superconnectors", vertices with 10M+ edges, and the
addEdge() call in EdgeListVertex has to iterate over all edges.
The solution is to either switch to HashMapVertex or add support for multigraphs (so that
checking for duplicate edges is not required).
                
> Timeout during edge exchange for big datasets
> ---------------------------------------------
>
>                 Key: GIRAPH-426
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-426
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>
> I'm seeing timeouts after all edge input splits are read from a dataset with 42B edges
using 400 workers.
> We probably lack some progress() calls in the code that processes mutations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message