incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: on the semantics of the combiner
Date Tue, 10 Jan 2012 17:49:29 GMT
i'm not sure i understand what you'd save here. if the two messages
were going to be expanded to k messages on the destination worker D,
but you expand them on W, you end up sending k messages instead of 2.
right?

On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan <jghoman@gmail.com> wrote:
>> it doesn't have to be expand, k, the number of elements returned by
>> the combiner, can still be smaller than n,
> Right.  Grouping would be the most common case.  It would be possible
> to be great than k, as well.  For instance, consider two messages,
> both generated on the same worker (W) by two two different vertices,
> both bound for another vertex, Z.  A combiner on W could get both of
> these messages, do some work on them, as it would have knowledge of
> both, and generate some arbitrary number of messages bound for other
> vertices (thus saving the shuffle/transfer of the original messages).
>
>
> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
> <claudio.martella@gmail.com> wrote:
>> it doesn't have to be expand, k, the number of elements returned by
>> the combiner, can still be smaller than n, the size of the messages
>> parameter. as a first example, you can imagine your vertex receiving
>> semantically-different classes/types of messages, and you can imagine
>> willing to be summarizing them in different messages, i.e. if your
>> messages come along with labels or just simply by the source vertex,
>> if required by the algorithm, think of label propagation to have just
>> an example, or some sort of labeled-pagerank.
>>
>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching <aching@apache.org> wrote:
>>> I agree that C&A doesn't require it, however, I can't think of why I would
>>> want to use a combiner to expand the number of messages.  Can you?
>>>
>>> Avery
>>>
>>>
>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>
>>>>> In my opinion that means reducing to a single message or none at all.
>>>>
>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>> instance, doesn't require a single  or no value to be returned; it has
>>>> the same interface as a reducer, zero or more values.  Would adapting
>>>> the semantics of Giraph's combiner to return a list of messages
>>>> (possibly empty) make it more useful?
>>>>
>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>> <claudio.martella@gmail.com>  wrote:
>>>>>
>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>
>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<aching@apache.org>
 wrote:
>>>>>>
>>>>>> Combiners should be commutative and associative.  In my opinion
that
>>>>>> means
>>>>>> reducing to a single message or none at all.  Can you think of a
case
>>>>>> when
>>>>>> more than 1 message should be returned from a combiner?  I know
that
>>>>>> returning null isn't preferable in general, but I think that
>>>>>> functionality
>>>>>> (returning no messages), is nice to have and isn't a huge amount
of work
>>>>>> on
>>>>>> our side.
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>>
>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>
>>>>>>> To clarify, I was not discussing the possibility for combine
to return
>>>>>>> null. I see why it would be useful, given that combine returns
M,
>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>> although i agree with Jakob, I also believe returning null should
be
>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>> reference/pointer.
>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>>>>> return Iterable<M>, basicallly letting it define what to
combine to
>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the
model,
>>>>>>> but maybe it's too much.
>>>>>>>
>>>>>>> As far as the size of the messages parameter, I agree with you
that 0
>>>>>>> messages gives nothing to combine and it would be somehow awkward,
it
>>>>>>> was more a matter of synching it with the other methods getting
the
>>>>>>> messages parameter.
>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jghoman@gmail.com>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> I'm not a big fan of returning null as it adds extra complexity
to the
>>>>>>>> calling code (null checks, or not, since people usually will
forget
>>>>>>>> them).  Avery is correct that combiners are application
specific.  Is
>>>>>>>> it conceivable that one would want to write a combiner that
returned
>>>>>>>> something for an input of no parameters, ie combining the
empty list
>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>> combining a single message would result in that message.
>>>>>>>>
>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<aching@apache.org>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>
>>>>>>>>>  /**
>>>>>>>>>   * Combines message values for a particular vertex
index.
>>>>>>>>>   *
>>>>>>>>>   * @param vertexIndex Index of the vertex getting these
messages
>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>   * @return Message that is combined from {@link MsgList}
or null if
>>>>>>>>> no
>>>>>>>>>   *         message it to be sent
>>>>>>>>>   * @throws IOException
>>>>>>>>>   */
>>>>>>>>>
>>>>>>>>> I think we are somewhat vague on what a combiner can
return to
>>>>>>>>> support
>>>>>>>>> various use cases.  A combiner should be particular
to a particular
>>>>>>>>> compute() algorithm.  I think it should be legal to
return null from
>>>>>>>>> a
>>>>>>>>> combiner, in that case, no message should be sent to
that vertex.
>>>>>>>>>
>>>>>>>>> It seems like it would be an overhead to call a combiner
when there
>>>>>>>>> are
>>>>>>>>> 0
>>>>>>>>> messages.  I can't see a case where that would be useful.
 Perhaps we
>>>>>>>>> should
>>>>>>>>> change the javadoc to insure that msgList must contain
at least one
>>>>>>>>> message
>>>>>>>>> to have combine() being called.
>>>>>>>>>
>>>>>>>>> Avery
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>
>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>> Fixing my test was not the issue, my question was
whether we want to
>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>> Personally, I believe the combiner should be ready
to receive 0
>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>> putMessages()
>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>
>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner
is
>>>>>>>>>> bugged.
>>>>>>>>>> It's true that the sum of no values is 0, but it's
also true that
>>>>>>>>>> the
>>>>>>>>>> null return semantics of combine() is more suitable
for this exact
>>>>>>>>>> situation.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ssc@apache.org>
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>> I think we currently implicitly assume that there
is at least one
>>>>>>>>>>> element in the Iterable passed to the combiner.
The messaging code
>>>>>>>>>>> only
>>>>>>>>>>> invokes the combiner only if at least one message
for the target
>>>>>>>>>>> vertex
>>>>>>>>>>> has been sent.
>>>>>>>>>>>
>>>>>>>>>>> However, we should not rely on implicit implementation
details but
>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>
>>>>>>>>>>> --sebastian
>>>>>>>>>>>
>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>
>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages
and hit an
>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>> currently, my code fails testBspCombiner
for the following reason:
>>>>>>>>>>>>
>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value
even if there are no
>>>>>>>>>>>> messages in the iterator (in this case it
returns 0) and for this
>>>>>>>>>>>> reason the vertices get activated at each
superstep.
>>>>>>>>>>>>
>>>>>>>>>>>> At each superstep, under-the-hood, I pass
the combiner for each
>>>>>>>>>>>> vertex
>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>
>>>>>>>>>>>>     public Iterable<M>      getMessages(I
vertexId) {
>>>>>>>>>>>>       Iterable<M>      messages
=
>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>               try {
>>>>>>>>>>>>                       combinedMsg
= combiner.combine(vertexId,
>>>>>>>>>>>> messages);
>>>>>>>>>>>>               }  catch (IOException
e) {
>>>>>>>>>>>>                       throw new
RuntimeException("could not
>>>>>>>>>>>> combine",
>>>>>>>>>>>> e);
>>>>>>>>>>>>               }
>>>>>>>>>>>>               if (combinedMsg != null)
{
>>>>>>>>>>>>                       List<M>
     tmp = new ArrayList<M>(1);
>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>                       messages
= tmp;
>>>>>>>>>>>>               } else {
>>>>>>>>>>>>                       messages
= new ArrayList<M>(0);
>>>>>>>>>>>>               }
>>>>>>>>>>>>       }
>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>     }
>>>>>>>>>>>>
>>>>>>>>>>>> the Iterable returned by this methods is
passed to
>>>>>>>>>>>> basicVertex.putMessages() right before the
compute().
>>>>>>>>>>>> Now, the question is: who's wrong? The combiner
code that returns
>>>>>>>>>>>> a
>>>>>>>>>>>> sum of 0 over no values, or the framework
that calls the combiner
>>>>>>>>>>>> with
>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>    Claudio Martella
>>>>>    claudio.martella@gmail.com
>>>
>>>
>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.martella@gmail.com



-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message