giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-243) EdgeListVertex performance is sub-optimal
Date Tue, 10 Jul 2012 08:06:36 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410123#comment-13410123
] 

Avery Ching commented on GIRAPH-243:
------------------------------------

I agree that we can use a single array, but probably best to not use Edge since it also has
conf, which is not needed.  We could create an EdgeConfigurable that extends Edge (without
conf).  You can actually solve the n lg n problem without an API change if you keep the index
that you last looked up and try that first.

I'm not sure that you gain much from having two arrays of objects vs one array of an object
with two objects in it.  Might want to try some microbenchmarks first on that.

But the n lg n -> n optimization should be useful either way.
                
> EdgeListVertex performance is sub-optimal
> -----------------------------------------
>
>                 Key: GIRAPH-243
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-243
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>
> Iterating over outgoing edges (both vertex id and value) in an EdgeListVertex is O(n
log n): you have to iterate over getOutEdgesIterator and call getEdgeValue (which is logarithmic)
at each iteration.
> If we store the edge list as a list of Edge<I, E> (instead of the current two lists),
iteration becomes linear time (which I think is what most people expect from an adjacency-list
representation).
> Furthermore, I think we can avoid sorting the list by ID: addEdge and removeEdge are
currently linear-time because we are using an ArrayList, so the binary search doesn't help
much. If we don't sort, at least addEdge becomes amortized constant-time.
> This improvement would require an API change: getOutEdgesIterator should iterate over
Edge<I, E> as opposed to just I.
> What I propose is to have both iterators (the current one could be called getNeighborsIterator),
with a default implementation of the neighbors iterator in terms of the edges iterator, and
optimizations left to implementors (e.g. IntIntNullIntVertex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message