incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: on the semantics of the combiner
Date Tue, 10 Jan 2012 02:05:13 GMT
I agree that C&A doesn't require it, however, I can't think of why I 
would want to use a combiner to expand the number of messages.  Can you?

Avery

On 1/9/12 3:57 PM, Jakob Homan wrote:
>> In my opinion that means reducing to a single message or none at all.
> C&A doesn't require this, however.  Hadoop's combiner interface, for
> instance, doesn't require a single  or no value to be returned; it has
> the same interface as a reducer, zero or more values.  Would adapting
> the semantics of Giraph's combiner to return a list of messages
> (possibly empty) make it more useful?
>
> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
> <claudio.martella@gmail.com>  wrote:
>> Yes, what is you say is completely reasonable, you convinced me :)
>>
>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<aching@apache.org>  wrote:
>>> Combiners should be commutative and associative.  In my opinion that means
>>> reducing to a single message or none at all.  Can you think of a case when
>>> more than 1 message should be returned from a combiner?  I know that
>>> returning null isn't preferable in general, but I think that functionality
>>> (returning no messages), is nice to have and isn't a huge amount of work on
>>> our side.
>>>
>>> Avery
>>>
>>>
>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>> To clarify, I was not discussing the possibility for combine to return
>>>> null. I see why it would be useful, given that combine returns M,
>>>> there's no other way to let combiner ask not to send any message,
>>>> although i agree with Jakob, I also believe returning null should be
>>>> avoided but only used, roughly, as an init value for a
>>>> reference/pointer.
>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>> return Iterable<M>, basicallly letting it define what to combine to
>>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>>> but maybe it's too much.
>>>>
>>>> As far as the size of the messages parameter, I agree with you that 0
>>>> messages gives nothing to combine and it would be somehow awkward, it
>>>> was more a matter of synching it with the other methods getting the
>>>> messages parameter.
>>>> Probably, having a more clear javadoc will do the job here.
>>>>
>>>> What do you think?
>>>>
>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jghoman@gmail.com>    wrote:
>>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>>> calling code (null checks, or not, since people usually will forget
>>>>> them).  Avery is correct that combiners are application specific.  Is
>>>>> it conceivable that one would want to write a combiner that returned
>>>>> something for an input of no parameters, ie combining the empty list
>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>> combining a single message would result in that message.
>>>>>
>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<aching@apache.org>
   wrote:
>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>
>>>>>>   /**
>>>>>>    * Combines message values for a particular vertex index.
>>>>>>    *
>>>>>>    * @param vertexIndex Index of the vertex getting these messages
>>>>>>    * @param msgList List of the messages to be combined
>>>>>>    * @return Message that is combined from {@link MsgList} or null
if no
>>>>>>    *         message it to be sent
>>>>>>    * @throws IOException
>>>>>>    */
>>>>>>
>>>>>> I think we are somewhat vague on what a combiner can return to support
>>>>>> various use cases.  A combiner should be particular to a particular
>>>>>> compute() algorithm.  I think it should be legal to return null from
a
>>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>>
>>>>>> It seems like it would be an overhead to call a combiner when there
are
>>>>>> 0
>>>>>> messages.  I can't see a case where that would be useful.  Perhaps
we
>>>>>> should
>>>>>> change the javadoc to insure that msgList must contain at least one
>>>>>> message
>>>>>> to have combine() being called.
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>>
>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>> Hi Sebastian,
>>>>>>>
>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>> Fixing my test was not the issue, my question was whether we
want to
>>>>>>> define explicitly the semantics of this scenario.
>>>>>>> Personally, I believe the combiner should be ready to receive
0
>>>>>>> messages, as it's the case of BasicVertex::initialize(), putMessages()
>>>>>>> and compute(), and act accordingly.
>>>>>>>
>>>>>>> In the particular example, I believe the SimpleSumCombiner is
bugged.
>>>>>>> It's true that the sum of no values is 0, but it's also true
that the
>>>>>>> null return semantics of combine() is more suitable for this
exact
>>>>>>> situation.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ssc@apache.org>
>>>>>>>   wrote:
>>>>>>>> I think we currently implicitly assume that there is at least
one
>>>>>>>> element in the Iterable passed to the combiner. The messaging
code
>>>>>>>> only
>>>>>>>> invokes the combiner only if at least one message for the
target
>>>>>>>> vertex
>>>>>>>> has been sent.
>>>>>>>>
>>>>>>>> However, we should not rely on implicit implementation details
but
>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>
>>>>>>>> --sebastian
>>>>>>>>
>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>> Hello list,
>>>>>>>>>
>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and
hit an
>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>> currently, my code fails testBspCombiner for the following
reason:
>>>>>>>>>
>>>>>>>>> SimpleSumCombiner::compute() returns a value even if
there are no
>>>>>>>>> messages in the iterator (in this case it returns 0)
and for this
>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>
>>>>>>>>> At each superstep, under-the-hood, I pass the combiner
for each
>>>>>>>>> vertex
>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>
>>>>>>>>>      public Iterable<M>      getMessages(I vertexId)
{
>>>>>>>>>        Iterable<M>      messages = inMessages.getMessages(vertexId);
>>>>>>>>>        if (combiner != null) {
>>>>>>>>>                M combinedMsg;
>>>>>>>>>                try {
>>>>>>>>>                        combinedMsg = combiner.combine(vertexId,
>>>>>>>>> messages);
>>>>>>>>>                }  catch (IOException e) {
>>>>>>>>>                        throw new RuntimeException("could
not combine",
>>>>>>>>> e);
>>>>>>>>>                }
>>>>>>>>>                if (combinedMsg != null) {
>>>>>>>>>                        List<M>      tmp = new ArrayList<M>(1);
>>>>>>>>>                        tmp.add(combinedMsg);
>>>>>>>>>                        messages = tmp;
>>>>>>>>>                } else {
>>>>>>>>>                        messages = new ArrayList<M>(0);
>>>>>>>>>                }
>>>>>>>>>        }
>>>>>>>>>        return messages;
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>> Now, the question is: who's wrong? The combiner code
that returns a
>>>>>>>>> sum of 0 over no values, or the framework that calls
the combiner
>>>>>>>>> with
>>>>>>>>> 0 messages?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>
>>
>> --
>>     Claudio Martella
>>     claudio.martella@gmail.com


Mime
View raw message