incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <>
Subject Re: [jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps
Date Wed, 02 Nov 2011 21:54:29 GMT
Hi Jake,

yep, you basically understand it correctly, it is indeed something
like a SSSP, but I'm starting from many vertices at the same time (so
i'd call it a grammar-based ASSP).

To make it more clear, the current code looks like:

    public void compute(Iterator<Message> messages)
    throws IOException {

        if (getSuperstep() == 0 && isSource()) {
            processMessage(new Message(this.query, new ResultSet()));
        } else {
            while (messages.hasNext()) {


as you can see I have used a hack and saved the query as VertexValue
(so I at least saved parsing it multiple times from String).

With what I have in mind, I could remove the first if-condition completely.
Also, as not all the vertices are starting vertices, I could obtain
the complete list of starting vertices by running a filter based on
the first element of my stack/query. Suppose my query asks for all the
actors who have acted in a movie which took place in USA. (actor -
acted_in -> movie - takes_place -> USA), I could set as a starting
vertices only those vertices that have an outgoing edge with label
"acted_in". That would save quite a lot. As we didn't have an API to
do this yet, I still haven't figured out how to do this precisely, but
i guess this could be done in the VertexReader.

On Wed, Nov 2, 2011 at 4:51 PM, Jake Mannix <> wrote:
> On Wed, Nov 2, 2011 at 5:07 AM, Claudio Martella <
>> wrote:
>> Not each vertex is a starting vertex for the path traversals, so it
>> could be nice if at the first superstep the starting vertices could
>> already have the messages in their inbox. This way the message
>> processing (and path traversal processing with it) could be fully
>> transparent (vertices just have to iterate throw messages without
>> caring about parsing, start vertices etc etc).
>> Currently, at first superstep, each vertex has to get che query from
>> the conf, parse it into a stack and discard it afterwards, as soon as
>> it discovers it is not a starting vertex for a path traversal.
>> Outside, I know who's a starting vertex and could just set the inbox
>> for those vertices accordingly.
> Let me understand what the situation looks like, by basically repeating
> this back to you: in your case, the set of queries which are to be run
> over your graph need to be read/parsed at startup and "given" to the
> starting node who is going to be responsible for initiating their
> propagation
> across the graph, right?
> For one thing, storing this batch set of queries in the Configuration
> doesn't seem very scalable - having that live on-disk and be read
> by the VertexReader does seem to make sense, although I wonder how
> easy it would be to coerce your serialized on-disk format to be nice
> and re-use the unchanging graph for multiple queries?  Maybe a
> MultipleInputFormats kind of thing would be needed?
> So how is this different from SimpleShortestPathsVertex, in the way
> that one vertex is "special"?  The compute() method of that vertex
> uses the knowledge of whether it is the source or not to set its "distance"
> from the origin, and only when that distance is > the current min does
> it send messages out.
> I could see how algorithms like this could possibly be sped up if we
> didn't need to iterate over *all* of the graph on each superstep, as the
> first step only needs to send messages to the vertices directly
> connected to the origin, then only it's 2nd degree, etc.  If we were
> only to process the vertices which had messages to send, for example,
> and iterate over those in a sparse fashion, this algorithm could be
> sped up considerably.
> I guess you're saying that similarly, not only if the messages were
> directly associated with their starting nodes, but if during the superstep
> we could only iterate over the nodes with messages, we could be much
> faster.
> It seems there might be a bunch of algorithms which would work this
> way, and I wonder if it would be a good idea to modify the system to
> have GraphMapper check to see if the vertex class "isSparse()" in
> some sense, which would mean that in the superstep walk:
>            for (Map.Entry<I, VertexRange<I, V, E, M>> entry :
>                serviceWorker.getVertexRangeMap().entrySet()) {
> // ...
>                for (BasicVertex<I, V, E, M> vertex :
>                        entry.getValue().getVertexMap().values()) {
> // ...
>                    if (!vertex.isHalted()) {
>                        Iterator<M> vertexMsgIt =
>                            vertex.getMsgList().iterator();
>                         context.progress();
>                        vertex.compute(vertexMsgIt);
>                    }
> we instead would iterate over a sparse data structure which
> only contains the vertices to be processed (this data structure
> would of course change from superstep to superstep as the
> messages propagate across the graph).  This could be a pretty
> significant speedup a lot of the time in practice, as it would
> mean that iterations per superstep would scale in complexity
> as O( |currently visible part of the graph| ) as opposed to
> O( | entire graph| ), even if most of the iteration is just a bunch
> of in-memory boolean and size checks.
> Not sure if this is exactly what you're talking about, however.
>  -jake

    Claudio Martella

View raw message