incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Homan <jgho...@gmail.com>
Subject Re: on the semantics of the combiner
Date Tue, 10 Jan 2012 18:04:15 GMT
Those two messages would have gone to D, been expanded to, say, 4,
which would have then then been sent to, say, M.  This would save the
sending of the two to D and send the 4 directly to M.  I'm not saying
it's a great example, but it is legal.  This is of course assuming
that combiners can generate messages bound for vertices other than the
original destination, which I don't know if that has even been
discussed.

On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
<claudio.martella@gmail.com> wrote:
> i'm not sure i understand what you'd save here. if the two messages
> were going to be expanded to k messages on the destination worker D,
> but you expand them on W, you end up sending k messages instead of 2.
> right?
>
> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan <jghoman@gmail.com> wrote:
>>> it doesn't have to be expand, k, the number of elements returned by
>>> the combiner, can still be smaller than n,
>> Right.  Grouping would be the most common case.  It would be possible
>> to be great than k, as well.  For instance, consider two messages,
>> both generated on the same worker (W) by two two different vertices,
>> both bound for another vertex, Z.  A combiner on W could get both of
>> these messages, do some work on them, as it would have knowledge of
>> both, and generate some arbitrary number of messages bound for other
>> vertices (thus saving the shuffle/transfer of the original messages).
>>
>>
>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>> <claudio.martella@gmail.com> wrote:
>>> it doesn't have to be expand, k, the number of elements returned by
>>> the combiner, can still be smaller than n, the size of the messages
>>> parameter. as a first example, you can imagine your vertex receiving
>>> semantically-different classes/types of messages, and you can imagine
>>> willing to be summarizing them in different messages, i.e. if your
>>> messages come along with labels or just simply by the source vertex,
>>> if required by the algorithm, think of label propagation to have just
>>> an example, or some sort of labeled-pagerank.
>>>
>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching <aching@apache.org> wrote:
>>>> I agree that C&A doesn't require it, however, I can't think of why I
would
>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>
>>>> Avery
>>>>
>>>>
>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>
>>>>>> In my opinion that means reducing to a single message or none at
all.
>>>>>
>>>>> C&A doesn't require this, however.  Hadoop's combiner interface,
for
>>>>> instance, doesn't require a single  or no value to be returned; it has
>>>>> the same interface as a reducer, zero or more values.  Would adapting
>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>> (possibly empty) make it more useful?
>>>>>
>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>> <claudio.martella@gmail.com>  wrote:
>>>>>>
>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>
>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<aching@apache.org>
 wrote:
>>>>>>>
>>>>>>> Combiners should be commutative and associative.  In my opinion
that
>>>>>>> means
>>>>>>> reducing to a single message or none at all.  Can you think
of a case
>>>>>>> when
>>>>>>> more than 1 message should be returned from a combiner?  I know
that
>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>> functionality
>>>>>>> (returning no messages), is nice to have and isn't a huge amount
of work
>>>>>>> on
>>>>>>> our side.
>>>>>>>
>>>>>>> Avery
>>>>>>>
>>>>>>>
>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>
>>>>>>>> To clarify, I was not discussing the possibility for combine
to return
>>>>>>>> null. I see why it would be useful, given that combine returns
M,
>>>>>>>> there's no other way to let combiner ask not to send any
message,
>>>>>>>> although i agree with Jakob, I also believe returning null
should be
>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>> reference/pointer.
>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let
combine()
>>>>>>>> return Iterable<M>, basicallly letting it define what
to combine to
>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to
the model,
>>>>>>>> but maybe it's too much.
>>>>>>>>
>>>>>>>> As far as the size of the messages parameter, I agree with
you that 0
>>>>>>>> messages gives nothing to combine and it would be somehow
awkward, it
>>>>>>>> was more a matter of synching it with the other methods getting
the
>>>>>>>> messages parameter.
>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jghoman@gmail.com>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> I'm not a big fan of returning null as it adds extra
complexity to the
>>>>>>>>> calling code (null checks, or not, since people usually
will forget
>>>>>>>>> them).  Avery is correct that combiners are application
specific.  Is
>>>>>>>>> it conceivable that one would want to write a combiner
that returned
>>>>>>>>> something for an input of no parameters, ie combining
the empty list
>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>
>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<aching@apache.org>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>
>>>>>>>>>>  /**
>>>>>>>>>>   * Combines message values for a particular vertex
index.
>>>>>>>>>>   *
>>>>>>>>>>   * @param vertexIndex Index of the vertex getting
these messages
>>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>>   * @return Message that is combined from {@link
MsgList} or null if
>>>>>>>>>> no
>>>>>>>>>>   *         message it to be sent
>>>>>>>>>>   * @throws IOException
>>>>>>>>>>   */
>>>>>>>>>>
>>>>>>>>>> I think we are somewhat vague on what a combiner
can return to
>>>>>>>>>> support
>>>>>>>>>> various use cases.  A combiner should be particular
to a particular
>>>>>>>>>> compute() algorithm.  I think it should be legal
to return null from
>>>>>>>>>> a
>>>>>>>>>> combiner, in that case, no message should be sent
to that vertex.
>>>>>>>>>>
>>>>>>>>>> It seems like it would be an overhead to call a combiner
when there
>>>>>>>>>> are
>>>>>>>>>> 0
>>>>>>>>>> messages.  I can't see a case where that would be
useful.  Perhaps we
>>>>>>>>>> should
>>>>>>>>>> change the javadoc to insure that msgList must contain
at least one
>>>>>>>>>> message
>>>>>>>>>> to have combine() being called.
>>>>>>>>>>
>>>>>>>>>> Avery
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>
>>>>>>>>>>> yes, that was my point, I agree completely with
you.
>>>>>>>>>>> Fixing my test was not the issue, my question
was whether we want to
>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>> Personally, I believe the combiner should be
ready to receive 0
>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>> putMessages()
>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>
>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner
is
>>>>>>>>>>> bugged.
>>>>>>>>>>> It's true that the sum of no values is 0, but
it's also true that
>>>>>>>>>>> the
>>>>>>>>>>> null return semantics of combine() is more suitable
for this exact
>>>>>>>>>>> situation.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ssc@apache.org>
>>>>>>>>>>>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I think we currently implicitly assume that
there is at least one
>>>>>>>>>>>> element in the Iterable passed to the combiner.
The messaging code
>>>>>>>>>>>> only
>>>>>>>>>>>> invokes the combiner only if at least one
message for the target
>>>>>>>>>>>> vertex
>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>
>>>>>>>>>>>> However, we should not rely on implicit implementation
details but
>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>
>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>
>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>
>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming
messages and hit an
>>>>>>>>>>>>> interesting problem with the combiner
semantics.
>>>>>>>>>>>>> currently, my code fails testBspCombiner
for the following reason:
>>>>>>>>>>>>>
>>>>>>>>>>>>> SimpleSumCombiner::compute() returns
a value even if there are no
>>>>>>>>>>>>> messages in the iterator (in this case
it returns 0) and for this
>>>>>>>>>>>>> reason the vertices get activated at
each superstep.
>>>>>>>>>>>>>
>>>>>>>>>>>>> At each superstep, under-the-hood, I
pass the combiner for each
>>>>>>>>>>>>> vertex
>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     public Iterable<M>    
 getMessages(I vertexId) {
>>>>>>>>>>>>>       Iterable<M>      messages
=
>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>>               try {
>>>>>>>>>>>>>                       combinedMsg
= combiner.combine(vertexId,
>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>               }  catch (IOException
e) {
>>>>>>>>>>>>>                       throw
new RuntimeException("could not
>>>>>>>>>>>>> combine",
>>>>>>>>>>>>> e);
>>>>>>>>>>>>>               }
>>>>>>>>>>>>>               if (combinedMsg
!= null) {
>>>>>>>>>>>>>                       List<M>
     tmp = new ArrayList<M>(1);
>>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>>                       messages
= tmp;
>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>                       messages
= new ArrayList<M>(0);
>>>>>>>>>>>>>               }
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>> the Iterable returned by this methods
is passed to
>>>>>>>>>>>>> basicVertex.putMessages() right before
the compute().
>>>>>>>>>>>>> Now, the question is: who's wrong? The
combiner code that returns
>>>>>>>>>>>>> a
>>>>>>>>>>>>> sum of 0 over no values, or the framework
that calls the combiner
>>>>>>>>>>>>> with
>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>    Claudio Martella
>>>>>>    claudio.martella@gmail.com
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>>    Claudio Martella
>>>    claudio.martella@gmail.com
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com

Mime
View raw message