incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Homan <jgho...@gmail.com>
Subject Re: on the semantics of the combiner
Date Fri, 13 Jan 2012 21:24:34 GMT
I think this is good, but really think a warning is enough, rather
than an exception.  There's no reason to pre-emptively

On Fri, Jan 13, 2012 at 1:03 PM, Sebastian Schelter <ssc@apache.org> wrote:
> +1 on Iterable <= messages.size() also from me.
>
>
> On 13.01.2012 19:51, Avery Ching wrote:
>> +1
>>
>> I'm fine with this.  If we agree to return an Iterable, then we should
>> make sure to either throw if the size of the Iterable > messages.size()
>> to at the very least LOG.warn("This combiner is likely to be implemented
>> wrong").  I prefer an exception, since we have no use case for expanding
>> the set of messages.
>>
>> Also, I'd like to have something in the javadoc saying something like
>> "While the number of messages returned can be equal to the same number
>> of messages that was inputted, the purpose of the combiner is to reduced
>> the number of messages from the input."
>>
>> Avery
>>
>> On 1/13/12 9:34 AM, Claudio Martella wrote:
>>> Ok,
>>>
>>> I guess we can vote then about this, what do you think?
>>> Shall we take 72h?
>>>
>>> I'm +1 for returning an iterable that can be empty.
>>> I'm +1 for the returned iterable to be<= messages.size()
>>>
>>>
>>> On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter<ssc@apache.org>
>>> wrote:
>>>> I think we should make the combiner return a list/iterable that can
>>>> potentially be empty. However we should assume that the number of
>>>> elements returned is smaller than or equal to the number of input
>>>> elements (whats the use of a combiner if this is not given?). I also
>>>> concur that the code should not depend on the combiner being applied
>>>> (similar to the way combiners work in hadoop).
>>>>
>>>> --sebastian
>>>>
>>>> 2012/1/10 Jakob Homan<jghoman@gmail.com>:
>>>>> A composite object would essentially be a wrapper around a list and
>>>>> introduce the need for all vertices to be ready to extract that list
>>>>> at all times.  For instance, a combiner passed 10 messages may be able
>>>>> to combine 7 of them but do nothing with the other three, leaving four
>>>>> messages.  If we allow zero or one return elements, the combiner would
>>>>> have to create a composite object with a list of those four messages,
>>>>> whereas if we return a list, it just skips that step and returns the
>>>>> four messages.  Additionally, the receiving vertex would have to
>>>>> handle the possibility of a composite object every time even though
>>>>> the combiner may or may not have been run during the superstep, or
>>>>> even included in that job (since combiners are optional to the job
>>>>> itself).  It would be better if one could write a Giraph application
>>>>> that was completely agnostic of whether or not a combiner was
>>>>> included.
>>>>>
>>>>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>>>>> <claudio.martella@gmail.com>  wrote:
>>>>>> I believe the argument of not letting users shoot their foot doesn't
>>>>>> stand :) Once you give them any API they have the power to do anything
>>>>>> wrong, as they already can with Giraph (or anything else for what
it
>>>>>> matters), by designing an algorithm wrongly (which would be what
it
>>>>>> would turn out to be a wrong combiner). It's definitely true that
a
>>>>>> composite object would make the grouping (List<Group>) but
I thought
>>>>>> we were talking about simplifying life to users :). I think it would
>>>>>> be more flexible (for the present and for the future) and also more
>>>>>> elegant,  but not necessarily a must (although it'd come practically
>>>>>> for free).
>>>>>>
>>>>>> Very cool discussion.
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan<jghoman@gmail.com>
>>>>>> wrote:
>>>>>>>> Combiners can only modify the messages sent to a single vertex,
>>>>>>>> so they can't send messages to other vertices.
>>>>>>> Yeah, the more I've thought about this, the more problematic
it would
>>>>>>> be.  These new messages may be generated upon arrival at the
>>>>>>> destination vertex (since combiners can be run on the receiving
>>>>>>> vertex
>>>>>>> before processing as well).  When would they be forwarded to
their
>>>>>>> new
>>>>>>> destinations at that point?  It would be possible to get into
a
>>>>>>> feedback loop of messages jumping around before a superstep could
>>>>>>> ever
>>>>>>> actually be done.
>>>>>>>
>>>>>>> That being said, our inability to think of a good application
doesn't
>>>>>>> mean there won't be one in the future, and it's probably better
to be
>>>>>>> more flexible than try to impose what appears optimal now.  The
>>>>>>> benefit of forcing 0 or 1 message from a combiner seems less
than the
>>>>>>> flexibility of allowing another list of messages (which may or
may
>>>>>>> not
>>>>>>> be the same number of elements as the original, less than, or
even
>>>>>>> more than).
>>>>>>>
>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>> Agreed.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching<aching@apache.org>
>>>>>>> wrote:
>>>>>>>> The general idea of combiners is to reduce the number of
messages
>>>>>>>> sent.
>>>>>>>>   Combiners are purely an optimization and the application
should
>>>>>>>> work
>>>>>>>> correctly without it (since it's never guaranteed to actually
be
>>>>>>>> called).
>>>>>>>>   Combiners can only modify the messages sent to a single
vertex,
>>>>>>>> so they
>>>>>>>> can't send messages to other vertices.  Any other work (i.e.
sending
>>>>>>>> messages) should be done by the vertex in the compute() method.
>>>>>>>>
>>>>>>>> While I think that grouping behavior could actually be
>>>>>>>> implemented within a
>>>>>>>> message object (still reducing the number of messages to
1 or 0)
>>>>>>>> I suppose
>>>>>>>> that in some simple cases (i.e. grouping), it might be easier
by
>>>>>>>> doing it in
>>>>>>>> the combiner as you both have mentioned?  The only thing
I
>>>>>>>> suppose I'm
>>>>>>>> concerned about is letting users do something that is not
optimal.
>>>>>>>>   Generally, expanding messages is not what you want your
>>>>>>>> combiner to do.
>>>>>>>>   Also, since grouping behavior can be implemented in the
message
>>>>>>>> object, it
>>>>>>>> forces users to avoid shooting themselves in the foot.
>>>>>>>>
>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>>>
>>>>>>>> Avery
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>>>> Ok, now i see where you're going. I guess that the thing
here is
>>>>>>>>> that
>>>>>>>>> the combiner would "act" like (on its behalf) D, and
to do so
>>>>>>>>> concretely it would probably need some local data related
to D
>>>>>>>>> (edges
>>>>>>>>> values? vertexvalue?).
>>>>>>>>> I also think that k>    n is also possible in principle
and we
>>>>>>>>> could let
>>>>>>>>> the user decide whether to use this power or not, once/if
we agree
>>>>>>>>> that letting the user send k messages in the combiner
is useful
>>>>>>>>> (and
>>>>>>>>> the grouping behavior shown by the label propagation
example
>>>>>>>>> should do
>>>>>>>>> so).
>>>>>>>>>
>>>>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob
>>>>>>>>> Homan<jghoman@gmail.com>    wrote:
>>>>>>>>>> Those two messages would have gone to D, been expanded
to, say, 4,
>>>>>>>>>> which would have then then been sent to, say, M.
 This would
>>>>>>>>>> save the
>>>>>>>>>> sending of the two to D and send the 4 directly to
M.  I'm not
>>>>>>>>>> saying
>>>>>>>>>> it's a great example, but it is legal.  This is
of course assuming
>>>>>>>>>> that combiners can generate messages bound for vertices
other
>>>>>>>>>> than the
>>>>>>>>>> original destination, which I don't know if that
has even been
>>>>>>>>>> discussed.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>>>>> <claudio.martella@gmail.com>    wrote:
>>>>>>>>>>> i'm not sure i understand what you'd save here.
if the two
>>>>>>>>>>> messages
>>>>>>>>>>> were going to be expanded to k messages on the
destination
>>>>>>>>>>> worker D,
>>>>>>>>>>> but you expand them on W, you end up sending
k messages
>>>>>>>>>>> instead of 2.
>>>>>>>>>>> right?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob
>>>>>>>>>>> Homan<jghoman@gmail.com>    wrote:
>>>>>>>>>>>>> it doesn't have to be expand, k, the
number of elements
>>>>>>>>>>>>> returned by
>>>>>>>>>>>>> the combiner, can still be smaller than
n,
>>>>>>>>>>>> Right.  Grouping would be the most common
case.  It would be
>>>>>>>>>>>> possible
>>>>>>>>>>>> to be great than k, as well.  For instance,
consider two
>>>>>>>>>>>> messages,
>>>>>>>>>>>> both generated on the same worker (W) by
two two different
>>>>>>>>>>>> vertices,
>>>>>>>>>>>> both bound for another vertex, Z.  A combiner
on W could get
>>>>>>>>>>>> both of
>>>>>>>>>>>> these messages, do some work on them, as
it would have
>>>>>>>>>>>> knowledge of
>>>>>>>>>>>> both, and generate some arbitrary number
of messages bound
>>>>>>>>>>>> for other
>>>>>>>>>>>> vertices (thus saving the shuffle/transfer
of the original
>>>>>>>>>>>> messages).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio
Martella
>>>>>>>>>>>> <claudio.martella@gmail.com>    wrote:
>>>>>>>>>>>>> it doesn't have to be expand, k, the
number of elements
>>>>>>>>>>>>> returned by
>>>>>>>>>>>>> the combiner, can still be smaller than
n, the size of the
>>>>>>>>>>>>> messages
>>>>>>>>>>>>> parameter. as a first example, you can
imagine your vertex
>>>>>>>>>>>>> receiving
>>>>>>>>>>>>> semantically-different classes/types
of messages, and you
>>>>>>>>>>>>> can imagine
>>>>>>>>>>>>> willing to be summarizing them in different
messages, i.e.
>>>>>>>>>>>>> if your
>>>>>>>>>>>>> messages come along with labels or just
simply by the source
>>>>>>>>>>>>> vertex,
>>>>>>>>>>>>> if required by the algorithm, think of
label propagation to
>>>>>>>>>>>>> have just
>>>>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery
Ching<aching@apache.org>
>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>> I agree that C&A doesn't require
it, however, I can't think
>>>>>>>>>>>>>> of why I
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>> want to use a combiner to expand
the number of messages.
>>>>>>>>>>>>>> Can you?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>>> In my opinion that means
reducing to a single message or
>>>>>>>>>>>>>>>> none at
>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>> C&A doesn't require this,
however.  Hadoop's combiner
>>>>>>>>>>>>>>> interface, for
>>>>>>>>>>>>>>> instance, doesn't require a single
 or no value to be
>>>>>>>>>>>>>>> returned; it
>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>> the same interface as a reducer,
zero or more values.  Would
>>>>>>>>>>>>>>> adapting
>>>>>>>>>>>>>>> the semantics of Giraph's combiner
to return a list of
>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>> (possibly empty) make it more
useful?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM,
Claudio Martella
>>>>>>>>>>>>>>> <claudio.martella@gmail.com>
     wrote:
>>>>>>>>>>>>>>>> Yes, what is you say is completely
reasonable, you
>>>>>>>>>>>>>>>> convinced me :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28
PM, Avery
>>>>>>>>>>>>>>>> Ching<aching@apache.org>
>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>> Combiners should be commutative
and associative.  In my
>>>>>>>>>>>>>>>>> opinion
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>> reducing to a single
message or none at all.  Can you
>>>>>>>>>>>>>>>>> think of a
>>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>> more than 1 message should
be returned from a combiner?
>>>>>>>>>>>>>>>>> I know
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> returning null isn't
preferable in general, but I think
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>>>>> (returning no messages),
is nice to have and isn't a
>>>>>>>>>>>>>>>>> huge amount
>>>>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio
Martella wrote:
>>>>>>>>>>>>>>>>>> To clarify, I was
not discussing the possibility for
>>>>>>>>>>>>>>>>>> combine to
>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>> null. I see why it
would be useful, given that combine
>>>>>>>>>>>>>>>>>> returns M,
>>>>>>>>>>>>>>>>>> there's no other
way to let combiner ask not to send
>>>>>>>>>>>>>>>>>> any message,
>>>>>>>>>>>>>>>>>> although i agree
with Jakob, I also believe returning
>>>>>>>>>>>>>>>>>> null should
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> avoided but only
used, roughly, as an init value for a
>>>>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>>>>> Perhaps, we could,
but i'm just thinking out loud here,
>>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>>>>> return Iterable<M>,
basicallly letting it define what
>>>>>>>>>>>>>>>>>> to combine
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> ({0, 1, k } messages).
It would be a powerful extension
>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>>>>> but maybe it's too
much.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As far as the size
of the messages parameter, I agree
>>>>>>>>>>>>>>>>>> with you
>>>>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>>>>> messages gives nothing
to combine and it would be somehow
>>>>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>>>>> was more a matter
of synching it with the other methods
>>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>>>>> Probably, having
a more clear javadoc will do the job
>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012
at 8:42 PM, Jakob
>>>>>>>>>>>>>>>>>> Homan<jghoman@gmail.com>
>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>> I'm not a big
fan of returning null as it adds extra
>>>>>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>> calling code
(null checks, or not, since people
>>>>>>>>>>>>>>>>>>> usually will
>>>>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>>>>> them).  Avery
is correct that combiners are application
>>>>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>>>>> it conceivable
that one would want to write a combiner
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>>>>> something for
an input of no parameters, ie combining
>>>>>>>>>>>>>>>>>>> the empty
>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>> doesn't return
the empty list?  I imagine for most
>>>>>>>>>>>>>>>>>>> combiners,
>>>>>>>>>>>>>>>>>>> combining a single
message would result in that message.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jan 9,
2012 at 11:28 AM, Avery
>>>>>>>>>>>>>>>>>>> Ching<aching@apache.org>
>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>> The javadoc
for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>>>>>>>    * Combines
message values for a particular vertex
>>>>>>>>>>>>>>>>>>>> index.
>>>>>>>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>>>>>>>    * @param
vertexIndex Index of the vertex getting
>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>>    * @param
msgList List of the messages to be combined
>>>>>>>>>>>>>>>>>>>>    * @return
Message that is combined from {@link
>>>>>>>>>>>>>>>>>>>> MsgList}
or
>>>>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>    *  
      message it to be sent
>>>>>>>>>>>>>>>>>>>>    * @throws
IOException
>>>>>>>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think we
are somewhat vague on what a combiner can
>>>>>>>>>>>>>>>>>>>> return to
>>>>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>>>>> various use
cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>> compute()
algorithm.  I think it should be legal to
>>>>>>>>>>>>>>>>>>>> return null
>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> combiner,
in that case, no message should be sent to
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It seems
like it would be an overhead to call a
>>>>>>>>>>>>>>>>>>>> combiner
when
>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>> messages.
 I can't see a case where that would be
>>>>>>>>>>>>>>>>>>>> useful.
>>>>>>>>>>>>>>>>>>>>   Perhaps
we
>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>> change the
javadoc to insure that msgList must
>>>>>>>>>>>>>>>>>>>> contain at
least
>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>> to have combine()
being called.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 1/9/12
5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> yes,
that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>>>>> Fixing
my test was not the issue, my question was
>>>>>>>>>>>>>>>>>>>>> whether
we
>>>>>>>>>>>>>>>>>>>>> want
to
>>>>>>>>>>>>>>>>>>>>> define
explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>>>>>> Personally,
I believe the combiner should be ready
>>>>>>>>>>>>>>>>>>>>> to receive
>>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>>> messages,
as it's the case of
>>>>>>>>>>>>>>>>>>>>> BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>>>>> and compute(),
and act accordingly.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In the
particular example, I believe the
>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner
is
>>>>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>>>>> It's
true that the sum of no values is 0, but it's
>>>>>>>>>>>>>>>>>>>>> also
true
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> null
return semantics of combine() is more suitable
>>>>>>>>>>>>>>>>>>>>> for this
>>>>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon,
Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>>>>> Schelter<ssc@apache.org>
>>>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>>>> I
think we currently implicitly assume that there
>>>>>>>>>>>>>>>>>>>>>> is
at least
>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>> element
in the Iterable passed to the combiner. The
>>>>>>>>>>>>>>>>>>>>>> messaging
>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>> invokes
the combiner only if at least one message
>>>>>>>>>>>>>>>>>>>>>> for
the
>>>>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>> has
been sent.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> However,
we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>>>>> details
but
>>>>>>>>>>>>>>>>>>>>>> explicitly
specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On
09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>>>>
Hello list,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
for GIRAPH-45 I'm touching the incoming messages
>>>>>>>>>>>>>>>>>>>>>>>
and hit an
>>>>>>>>>>>>>>>>>>>>>>>
interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>>>>>
currently, my code fails testBspCombiner for the
>>>>>>>>>>>>>>>>>>>>>>>
following
>>>>>>>>>>>>>>>>>>>>>>>
reason:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
SimpleSumCombiner::compute() returns a value even
>>>>>>>>>>>>>>>>>>>>>>>
if there
>>>>>>>>>>>>>>>>>>>>>>>
are no
>>>>>>>>>>>>>>>>>>>>>>>
messages in the iterator (in this case it returns
>>>>>>>>>>>>>>>>>>>>>>>
0) and for
>>>>>>>>>>>>>>>>>>>>>>>
this
>>>>>>>>>>>>>>>>>>>>>>>
reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
At each superstep, under-the-hood, I pass the
>>>>>>>>>>>>>>>>>>>>>>>
combiner for
>>>>>>>>>>>>>>>>>>>>>>>
each
>>>>>>>>>>>>>>>>>>>>>>>
vertex
>>>>>>>>>>>>>>>>>>>>>>>
an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
     public Iterable<M>          getMessages(I
>>>>>>>>>>>>>>>>>>>>>>>
vertexId) {
>>>>>>>>>>>>>>>>>>>>>>>
       Iterable<M>          messages =
>>>>>>>>>>>>>>>>>>>>>>>
inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>>>>
       if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>>>>
               M combinedMsg;
>>>>>>>>>>>>>>>>>>>>>>>
               try {
>>>>>>>>>>>>>>>>>>>>>>>
                       combinedMsg =
>>>>>>>>>>>>>>>>>>>>>>>
combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>>>>>
messages);
>>>>>>>>>>>>>>>>>>>>>>>
               }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>>>>
                       throw new
>>>>>>>>>>>>>>>>>>>>>>>
RuntimeException("could not
>>>>>>>>>>>>>>>>>>>>>>>
combine",
>>>>>>>>>>>>>>>>>>>>>>>
e);
>>>>>>>>>>>>>>>>>>>>>>>
               }
>>>>>>>>>>>>>>>>>>>>>>>
               if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>>>>
                       List<M>          tmp = new
>>>>>>>>>>>>>>>>>>>>>>>
ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>>>>
                       tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>>>>
                       messages = tmp;
>>>>>>>>>>>>>>>>>>>>>>>
               } else {
>>>>>>>>>>>>>>>>>>>>>>>
                       messages = new
>>>>>>>>>>>>>>>>>>>>>>>
ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>>>>
               }
>>>>>>>>>>>>>>>>>>>>>>>
       }
>>>>>>>>>>>>>>>>>>>>>>>
       return messages;
>>>>>>>>>>>>>>>>>>>>>>>
     }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>>>>>
basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>>>>>>
Now, the question is: who's wrong? The combiner
>>>>>>>>>>>>>>>>>>>>>>>
code that
>>>>>>>>>>>>>>>>>>>>>>>
returns
>>>>>>>>>>>>>>>>>>>>>>>
a
>>>>>>>>>>>>>>>>>>>>>>>
sum of 0 over no values, or the framework that
>>>>>>>>>>>>>>>>>>>>>>>
calls the
>>>>>>>>>>>>>>>>>>>>>>>
combiner
>>>>>>>>>>>>>>>>>>>>>>>
with
>>>>>>>>>>>>>>>>>>>>>>>
0 messages?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>     Claudio Martella
>>>>>>     claudio.martella@gmail.com
>>>
>>>
>>
>

Mime
View raw message