incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: on the semantics of the combiner
Date Fri, 13 Jan 2012 21:03:02 GMT
+1 on Iterable <= messages.size() also from me.


On 13.01.2012 19:51, Avery Ching wrote:
> +1
> 
> I'm fine with this.  If we agree to return an Iterable, then we should
> make sure to either throw if the size of the Iterable > messages.size()
> to at the very least LOG.warn("This combiner is likely to be implemented
> wrong").  I prefer an exception, since we have no use case for expanding
> the set of messages.
> 
> Also, I'd like to have something in the javadoc saying something like
> "While the number of messages returned can be equal to the same number
> of messages that was inputted, the purpose of the combiner is to reduced
> the number of messages from the input."
> 
> Avery
> 
> On 1/13/12 9:34 AM, Claudio Martella wrote:
>> Ok,
>>
>> I guess we can vote then about this, what do you think?
>> Shall we take 72h?
>>
>> I'm +1 for returning an iterable that can be empty.
>> I'm +1 for the returned iterable to be<= messages.size()
>>
>>
>> On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter<ssc@apache.org> 
>> wrote:
>>> I think we should make the combiner return a list/iterable that can
>>> potentially be empty. However we should assume that the number of
>>> elements returned is smaller than or equal to the number of input
>>> elements (whats the use of a combiner if this is not given?). I also
>>> concur that the code should not depend on the combiner being applied
>>> (similar to the way combiners work in hadoop).
>>>
>>> --sebastian
>>>
>>> 2012/1/10 Jakob Homan<jghoman@gmail.com>:
>>>> A composite object would essentially be a wrapper around a list and
>>>> introduce the need for all vertices to be ready to extract that list
>>>> at all times.  For instance, a combiner passed 10 messages may be able
>>>> to combine 7 of them but do nothing with the other three, leaving four
>>>> messages.  If we allow zero or one return elements, the combiner would
>>>> have to create a composite object with a list of those four messages,
>>>> whereas if we return a list, it just skips that step and returns the
>>>> four messages.  Additionally, the receiving vertex would have to
>>>> handle the possibility of a composite object every time even though
>>>> the combiner may or may not have been run during the superstep, or
>>>> even included in that job (since combiners are optional to the job
>>>> itself).  It would be better if one could write a Giraph application
>>>> that was completely agnostic of whether or not a combiner was
>>>> included.
>>>>
>>>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>>>> <claudio.martella@gmail.com>  wrote:
>>>>> I believe the argument of not letting users shoot their foot doesn't
>>>>> stand :) Once you give them any API they have the power to do anything
>>>>> wrong, as they already can with Giraph (or anything else for what it
>>>>> matters), by designing an algorithm wrongly (which would be what it
>>>>> would turn out to be a wrong combiner). It's definitely true that a
>>>>> composite object would make the grouping (List<Group>) but I thought
>>>>> we were talking about simplifying life to users :). I think it would
>>>>> be more flexible (for the present and for the future) and also more
>>>>> elegant,  but not necessarily a must (although it'd come practically
>>>>> for free).
>>>>>
>>>>> Very cool discussion.
>>>>>
>>>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan<jghoman@gmail.com>

>>>>> wrote:
>>>>>>> Combiners can only modify the messages sent to a single vertex,
>>>>>>> so they can't send messages to other vertices.
>>>>>> Yeah, the more I've thought about this, the more problematic it would
>>>>>> be.  These new messages may be generated upon arrival at the
>>>>>> destination vertex (since combiners can be run on the receiving
>>>>>> vertex
>>>>>> before processing as well).  When would they be forwarded to their
>>>>>> new
>>>>>> destinations at that point?  It would be possible to get into a
>>>>>> feedback loop of messages jumping around before a superstep could
>>>>>> ever
>>>>>> actually be done.
>>>>>>
>>>>>> That being said, our inability to think of a good application doesn't
>>>>>> mean there won't be one in the future, and it's probably better to
be
>>>>>> more flexible than try to impose what appears optimal now.  The
>>>>>> benefit of forcing 0 or 1 message from a combiner seems less than
the
>>>>>> flexibility of allowing another list of messages (which may or may
>>>>>> not
>>>>>> be the same number of elements as the original, less than, or even
>>>>>> more than).
>>>>>>
>>>>>>> Good discussion (it's making me really think about this)!
>>>>>> Agreed.
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching<aching@apache.org>

>>>>>> wrote:
>>>>>>> The general idea of combiners is to reduce the number of messages
>>>>>>> sent.
>>>>>>>   Combiners are purely an optimization and the application should
>>>>>>> work
>>>>>>> correctly without it (since it's never guaranteed to actually
be
>>>>>>> called).
>>>>>>>   Combiners can only modify the messages sent to a single vertex,
>>>>>>> so they
>>>>>>> can't send messages to other vertices.  Any other work (i.e.
sending
>>>>>>> messages) should be done by the vertex in the compute() method.
>>>>>>>
>>>>>>> While I think that grouping behavior could actually be
>>>>>>> implemented within a
>>>>>>> message object (still reducing the number of messages to 1 or
0)
>>>>>>> I suppose
>>>>>>> that in some simple cases (i.e. grouping), it might be easier
by
>>>>>>> doing it in
>>>>>>> the combiner as you both have mentioned?  The only thing I
>>>>>>> suppose I'm
>>>>>>> concerned about is letting users do something that is not optimal.
>>>>>>>   Generally, expanding messages is not what you want your
>>>>>>> combiner to do.
>>>>>>>   Also, since grouping behavior can be implemented in the message
>>>>>>> object, it
>>>>>>> forces users to avoid shooting themselves in the foot.
>>>>>>>
>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>>
>>>>>>> Avery
>>>>>>>
>>>>>>>
>>>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>>> Ok, now i see where you're going. I guess that the thing
here is
>>>>>>>> that
>>>>>>>> the combiner would "act" like (on its behalf) D, and to do
so
>>>>>>>> concretely it would probably need some local data related
to D
>>>>>>>> (edges
>>>>>>>> values? vertexvalue?).
>>>>>>>> I also think that k>    n is also possible in principle
and we
>>>>>>>> could let
>>>>>>>> the user decide whether to use this power or not, once/if
we agree
>>>>>>>> that letting the user send k messages in the combiner is
useful
>>>>>>>> (and
>>>>>>>> the grouping behavior shown by the label propagation example
>>>>>>>> should do
>>>>>>>> so).
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob
>>>>>>>> Homan<jghoman@gmail.com>    wrote:
>>>>>>>>> Those two messages would have gone to D, been expanded
to, say, 4,
>>>>>>>>> which would have then then been sent to, say, M.  This
would
>>>>>>>>> save the
>>>>>>>>> sending of the two to D and send the 4 directly to M.
 I'm not
>>>>>>>>> saying
>>>>>>>>> it's a great example, but it is legal.  This is of course
assuming
>>>>>>>>> that combiners can generate messages bound for vertices
other
>>>>>>>>> than the
>>>>>>>>> original destination, which I don't know if that has
even been
>>>>>>>>> discussed.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>>>> <claudio.martella@gmail.com>    wrote:
>>>>>>>>>> i'm not sure i understand what you'd save here. if
the two
>>>>>>>>>> messages
>>>>>>>>>> were going to be expanded to k messages on the destination
>>>>>>>>>> worker D,
>>>>>>>>>> but you expand them on W, you end up sending k messages
>>>>>>>>>> instead of 2.
>>>>>>>>>> right?
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob
>>>>>>>>>> Homan<jghoman@gmail.com>    wrote:
>>>>>>>>>>>> it doesn't have to be expand, k, the number
of elements
>>>>>>>>>>>> returned by
>>>>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>>>> Right.  Grouping would be the most common case.
 It would be
>>>>>>>>>>> possible
>>>>>>>>>>> to be great than k, as well.  For instance, consider
two
>>>>>>>>>>> messages,
>>>>>>>>>>> both generated on the same worker (W) by two
two different
>>>>>>>>>>> vertices,
>>>>>>>>>>> both bound for another vertex, Z.  A combiner
on W could get
>>>>>>>>>>> both of
>>>>>>>>>>> these messages, do some work on them, as it would
have
>>>>>>>>>>> knowledge of
>>>>>>>>>>> both, and generate some arbitrary number of messages
bound
>>>>>>>>>>> for other
>>>>>>>>>>> vertices (thus saving the shuffle/transfer of
the original
>>>>>>>>>>> messages).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>>>>> <claudio.martella@gmail.com>    wrote:
>>>>>>>>>>>> it doesn't have to be expand, k, the number
of elements
>>>>>>>>>>>> returned by
>>>>>>>>>>>> the combiner, can still be smaller than n,
the size of the
>>>>>>>>>>>> messages
>>>>>>>>>>>> parameter. as a first example, you can imagine
your vertex
>>>>>>>>>>>> receiving
>>>>>>>>>>>> semantically-different classes/types of messages,
and you
>>>>>>>>>>>> can imagine
>>>>>>>>>>>> willing to be summarizing them in different
messages, i.e.
>>>>>>>>>>>> if your
>>>>>>>>>>>> messages come along with labels or just simply
by the source
>>>>>>>>>>>> vertex,
>>>>>>>>>>>> if required by the algorithm, think of label
propagation to
>>>>>>>>>>>> have just
>>>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<aching@apache.org>
>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>> I agree that C&A doesn't require
it, however, I can't think
>>>>>>>>>>>>> of why I
>>>>>>>>>>>>> would
>>>>>>>>>>>>> want to use a combiner to expand the
number of messages. 
>>>>>>>>>>>>> Can you?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>> In my opinion that means reducing
to a single message or
>>>>>>>>>>>>>>> none at
>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>> C&A doesn't require this, however.
 Hadoop's combiner
>>>>>>>>>>>>>> interface, for
>>>>>>>>>>>>>> instance, doesn't require a single
 or no value to be
>>>>>>>>>>>>>> returned; it
>>>>>>>>>>>>>> has
>>>>>>>>>>>>>> the same interface as a reducer,
zero or more values.  Would
>>>>>>>>>>>>>> adapting
>>>>>>>>>>>>>> the semantics of Giraph's combiner
to return a list of
>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio
Martella
>>>>>>>>>>>>>> <claudio.martella@gmail.com>
     wrote:
>>>>>>>>>>>>>>> Yes, what is you say is completely
reasonable, you
>>>>>>>>>>>>>>> convinced me :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28
PM, Avery
>>>>>>>>>>>>>>> Ching<aching@apache.org>
>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>> Combiners should be commutative
and associative.  In my
>>>>>>>>>>>>>>>> opinion
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>> reducing to a single message
or none at all.  Can you
>>>>>>>>>>>>>>>> think of a
>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>> more than 1 message should
be returned from a combiner? 
>>>>>>>>>>>>>>>> I know
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> returning null isn't preferable
in general, but I think
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>>>> (returning no messages),
is nice to have and isn't a
>>>>>>>>>>>>>>>> huge amount
>>>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio
Martella wrote:
>>>>>>>>>>>>>>>>> To clarify, I was not
discussing the possibility for
>>>>>>>>>>>>>>>>> combine to
>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>> null. I see why it would
be useful, given that combine
>>>>>>>>>>>>>>>>> returns M,
>>>>>>>>>>>>>>>>> there's no other way
to let combiner ask not to send
>>>>>>>>>>>>>>>>> any message,
>>>>>>>>>>>>>>>>> although i agree with
Jakob, I also believe returning
>>>>>>>>>>>>>>>>> null should
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> avoided but only used,
roughly, as an init value for a
>>>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>>>> Perhaps, we could, but
i'm just thinking out loud here,
>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>>>> return Iterable<M>,
basicallly letting it define what
>>>>>>>>>>>>>>>>> to combine
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> ({0, 1, k } messages).
It would be a powerful extension
>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As far as the size of
the messages parameter, I agree
>>>>>>>>>>>>>>>>> with you
>>>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>>>> messages gives nothing
to combine and it would be somehow
>>>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>>>> was more a matter of
synching it with the other methods
>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>>>> Probably, having a more
clear javadoc will do the job
>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at
8:42 PM, Jakob
>>>>>>>>>>>>>>>>> Homan<jghoman@gmail.com>
>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>> I'm not a big fan
of returning null as it adds extra
>>>>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>> calling code (null
checks, or not, since people
>>>>>>>>>>>>>>>>>> usually will
>>>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>>>> them).  Avery is
correct that combiners are application
>>>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>>>> it conceivable that
one would want to write a combiner
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>>>> something for an
input of no parameters, ie combining
>>>>>>>>>>>>>>>>>> the empty
>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>> doesn't return the
empty list?  I imagine for most
>>>>>>>>>>>>>>>>>> combiners,
>>>>>>>>>>>>>>>>>> combining a single
message would result in that message.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012
at 11:28 AM, Avery
>>>>>>>>>>>>>>>>>> Ching<aching@apache.org>
>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>> The javadoc for
VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>>>>>>    * Combines
message values for a particular vertex
>>>>>>>>>>>>>>>>>>> index.
>>>>>>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>>>>>>    * @param vertexIndex
Index of the vertex getting
>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>    * @param msgList
List of the messages to be combined
>>>>>>>>>>>>>>>>>>>    * @return
Message that is combined from {@link
>>>>>>>>>>>>>>>>>>> MsgList} or
>>>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>    *        
message it to be sent
>>>>>>>>>>>>>>>>>>>    * @throws
IOException
>>>>>>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think we are
somewhat vague on what a combiner can
>>>>>>>>>>>>>>>>>>> return to
>>>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>>>> various use cases.
 A combiner should be particular to a
>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>> compute() algorithm.
 I think it should be legal to
>>>>>>>>>>>>>>>>>>> return null
>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> combiner, in
that case, no message should be sent to
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It seems like
it would be an overhead to call a
>>>>>>>>>>>>>>>>>>> combiner when
>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>> messages.  I
can't see a case where that would be
>>>>>>>>>>>>>>>>>>> useful.
>>>>>>>>>>>>>>>>>>>   Perhaps we
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> change the javadoc
to insure that msgList must
>>>>>>>>>>>>>>>>>>> contain at least
>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>> to have combine()
being called.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 1/9/12 5:37
AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> yes, that
was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>>>> Fixing my
test was not the issue, my question was
>>>>>>>>>>>>>>>>>>>> whether we
>>>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>>>> define explicitly
the semantics of this scenario.
>>>>>>>>>>>>>>>>>>>> Personally,
I believe the combiner should be ready
>>>>>>>>>>>>>>>>>>>> to receive
>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>> messages,
as it's the case of
>>>>>>>>>>>>>>>>>>>> BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>>>> and compute(),
and act accordingly.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In the particular
example, I believe the
>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner
is
>>>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>>>> It's true
that the sum of no values is 0, but it's
>>>>>>>>>>>>>>>>>>>> also true
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> null return
semantics of combine() is more suitable
>>>>>>>>>>>>>>>>>>>> for this
>>>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Jan
9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>>>> Schelter<ssc@apache.org>
>>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>>> I think
we currently implicitly assume that there
>>>>>>>>>>>>>>>>>>>>> is at
least
>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>> element
in the Iterable passed to the combiner. The
>>>>>>>>>>>>>>>>>>>>> messaging
>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>> invokes
the combiner only if at least one message
>>>>>>>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>> has been
sent.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> However,
we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>>>> details
but
>>>>>>>>>>>>>>>>>>>>> explicitly
specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 09.01.2012
13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>>> Hello
list,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> for
GIRAPH-45 I'm touching the incoming messages
>>>>>>>>>>>>>>>>>>>>>> and
hit an
>>>>>>>>>>>>>>>>>>>>>> interesting
problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>>>> currently,
my code fails testBspCombiner for the
>>>>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute()
returns a value even
>>>>>>>>>>>>>>>>>>>>>> if
there
>>>>>>>>>>>>>>>>>>>>>> are
no
>>>>>>>>>>>>>>>>>>>>>> messages
in the iterator (in this case it returns
>>>>>>>>>>>>>>>>>>>>>> 0)
and for
>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> reason
the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> At
each superstep, under-the-hood, I pass the
>>>>>>>>>>>>>>>>>>>>>> combiner
for
>>>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>> an
Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 
    public Iterable<M>          getMessages(I
>>>>>>>>>>>>>>>>>>>>>> vertexId)
{
>>>>>>>>>>>>>>>>>>>>>> 
      Iterable<M>          messages =
>>>>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>>> 
      if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>>> 
              M combinedMsg;
>>>>>>>>>>>>>>>>>>>>>> 
              try {
>>>>>>>>>>>>>>>>>>>>>> 
                      combinedMsg =
>>>>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>>>> 
              }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>>> 
                      throw new
>>>>>>>>>>>>>>>>>>>>>> RuntimeException("could
not
>>>>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>>>> 
              }
>>>>>>>>>>>>>>>>>>>>>> 
              if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>>> 
                      List<M>          tmp = new
>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>>> 
                      tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>>> 
                      messages = tmp;
>>>>>>>>>>>>>>>>>>>>>> 
              } else {
>>>>>>>>>>>>>>>>>>>>>> 
                      messages = new
>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>>> 
              }
>>>>>>>>>>>>>>>>>>>>>> 
      }
>>>>>>>>>>>>>>>>>>>>>> 
      return messages;
>>>>>>>>>>>>>>>>>>>>>> 
    }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> the
Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>>>> basicVertex.putMessages()
right before the compute().
>>>>>>>>>>>>>>>>>>>>>> Now,
the question is: who's wrong? The combiner
>>>>>>>>>>>>>>>>>>>>>> code
that
>>>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> sum
of 0 over no values, or the framework that
>>>>>>>>>>>>>>>>>>>>>> calls
the
>>>>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>> 0
messages?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>>     Claudio Martella
>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>     Claudio Martella
>>>>>     claudio.martella@gmail.com
>>
>>
> 


Mime
View raw message