giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-243) EdgeListVertex performance is sub-optimal
Date Sat, 07 Jul 2012 15:56:35 GMT
Alessandro Presta created GIRAPH-243:
----------------------------------------

             Summary: EdgeListVertex performance is sub-optimal
                 Key: GIRAPH-243
                 URL: https://issues.apache.org/jira/browse/GIRAPH-243
             Project: Giraph
          Issue Type: Improvement
            Reporter: Alessandro Presta
            Assignee: Alessandro Presta


Iterating over outgoing edges (both vertex id and value) in an EdgeListVertex is O(n log n):
you have to iterate over getOutEdgesIterator and call getEdgeValue (which is logarithmic)
at each iteration.

If we store the edge list as a list of Edge<I, E> (instead of the current two lists),
iteration becomes linear time (which I think is what most people expect from an adjacency-list
representation).

Furthermore, I think we can avoid sorting the list by ID: addEdge and removeEdge are currently
linear-time because we are using an ArrayList, so the binary search doesn't help much. If
we don't sort, at least addEdge becomes amortized constant-time.

This improvement would require an API change: getOutEdgesIterator should iterate over Edge<I,
E> as opposed to just I.
What I propose is to have both iterators (the current one could be called getNeighborsIterator),
with a default implementation of the neighbors iterator in terms of the edges iterator, and
optimizations left to implementors (e.g. IntIntNullIntVertex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message