incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: on the semantics of the combiner
Date Tue, 10 Jan 2012 20:00:51 GMT
I believe the argument of not letting users shoot their foot doesn't
stand :) Once you give them any API they have the power to do anything
wrong, as they already can with Giraph (or anything else for what it
matters), by designing an algorithm wrongly (which would be what it
would turn out to be a wrong combiner). It's definitely true that a
composite object would make the grouping (List<Group>) but I thought
we were talking about simplifying life to users :). I think it would
be more flexible (for the present and for the future) and also more
elegant,  but not necessarily a must (although it'd come practically
for free).

Very cool discussion.

On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan <jghoman@gmail.com> wrote:
>> Combiners can only modify the messages sent to a single vertex, so they can't send
messages to other vertices.
> Yeah, the more I've thought about this, the more problematic it would
> be.  These new messages may be generated upon arrival at the
> destination vertex (since combiners can be run on the receiving vertex
> before processing as well).  When would they be forwarded to their new
> destinations at that point?  It would be possible to get into a
> feedback loop of messages jumping around before a superstep could ever
> actually be done.
>
> That being said, our inability to think of a good application doesn't
> mean there won't be one in the future, and it's probably better to be
> more flexible than try to impose what appears optimal now.  The
> benefit of forcing 0 or 1 message from a combiner seems less than the
> flexibility of allowing another list of messages (which may or may not
> be the same number of elements as the original, less than, or even
> more than).
>
>>Good discussion (it's making me really think about this)!
> Agreed.
>
>
> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching <aching@apache.org> wrote:
>> The general idea of combiners is to reduce the number of messages sent.
>>  Combiners are purely an optimization and the application should work
>> correctly without it (since it's never guaranteed to actually be called).
>>  Combiners can only modify the messages sent to a single vertex, so they
>> can't send messages to other vertices.  Any other work (i.e. sending
>> messages) should be done by the vertex in the compute() method.
>>
>> While I think that grouping behavior could actually be implemented within a
>> message object (still reducing the number of messages to 1 or 0) I suppose
>> that in some simple cases (i.e. grouping), it might be easier by doing it in
>> the combiner as you both have mentioned?  The only thing I suppose I'm
>> concerned about is letting users do something that is not optimal.
>>  Generally, expanding messages is not what you want your combiner to do.
>>  Also, since grouping behavior can be implemented in the message object, it
>> forces users to avoid shooting themselves in the foot.
>>
>> Good discussion (it's making me really think about this)!
>>
>> Avery
>>
>>
>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>
>>> Ok, now i see where you're going. I guess that the thing here is that
>>> the combiner would "act" like (on its behalf) D, and to do so
>>> concretely it would probably need some local data related to D (edges
>>> values? vertexvalue?).
>>> I also think that k>  n is also possible in principle and we could let
>>> the user decide whether to use this power or not, once/if we agree
>>> that letting the user send k messages in the combiner is useful (and
>>> the grouping behavior shown by the label propagation example should do
>>> so).
>>>
>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<jghoman@gmail.com>  wrote:
>>>>
>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>> which would have then then been sent to, say, M.  This would save the
>>>> sending of the two to D and send the 4 directly to M.  I'm not saying
>>>> it's a great example, but it is legal.  This is of course assuming
>>>> that combiners can generate messages bound for vertices other than the
>>>> original destination, which I don't know if that has even been
>>>> discussed.
>>>>
>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>> <claudio.martella@gmail.com>  wrote:
>>>>>
>>>>> i'm not sure i understand what you'd save here. if the two messages
>>>>> were going to be expanded to k messages on the destination worker D,
>>>>> but you expand them on W, you end up sending k messages instead of 2.
>>>>> right?
>>>>>
>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<jghoman@gmail.com>
 wrote:
>>>>>>>
>>>>>>> it doesn't have to be expand, k, the number of elements returned
by
>>>>>>> the combiner, can still be smaller than n,
>>>>>>
>>>>>> Right.  Grouping would be the most common case.  It would be possible
>>>>>> to be great than k, as well.  For instance, consider two messages,
>>>>>> both generated on the same worker (W) by two two different vertices,
>>>>>> both bound for another vertex, Z.  A combiner on W could get both
of
>>>>>> these messages, do some work on them, as it would have knowledge
of
>>>>>> both, and generate some arbitrary number of messages bound for other
>>>>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>> <claudio.martella@gmail.com>  wrote:
>>>>>>>
>>>>>>> it doesn't have to be expand, k, the number of elements returned
by
>>>>>>> the combiner, can still be smaller than n, the size of the messages
>>>>>>> parameter. as a first example, you can imagine your vertex receiving
>>>>>>> semantically-different classes/types of messages, and you can
imagine
>>>>>>> willing to be summarizing them in different messages, i.e. if
your
>>>>>>> messages come along with labels or just simply by the source
vertex,
>>>>>>> if required by the algorithm, think of label propagation to have
just
>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<aching@apache.org>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> I agree that C&A doesn't require it, however, I can't
think of why I
>>>>>>>> would
>>>>>>>> want to use a combiner to expand the number of messages.
 Can you?
>>>>>>>>
>>>>>>>> Avery
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>
>>>>>>>>>> In my opinion that means reducing to a single message
or none at
>>>>>>>>>> all.
>>>>>>>>>
>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner
interface, for
>>>>>>>>> instance, doesn't require a single  or no value to be
returned; it
>>>>>>>>> has
>>>>>>>>> the same interface as a reducer, zero or more values.
 Would
>>>>>>>>> adapting
>>>>>>>>> the semantics of Giraph's combiner to return a list of
messages
>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>
>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>> <claudio.martella@gmail.com>    wrote:
>>>>>>>>>>
>>>>>>>>>> Yes, what is you say is completely reasonable, you
convinced me :)
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<aching@apache.org>
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>> Combiners should be commutative and associative.
 In my opinion
>>>>>>>>>>> that
>>>>>>>>>>> means
>>>>>>>>>>> reducing to a single message or none at all.
 Can you think of a
>>>>>>>>>>> case
>>>>>>>>>>> when
>>>>>>>>>>> more than 1 message should be returned from a
combiner?  I know
>>>>>>>>>>> that
>>>>>>>>>>> returning null isn't preferable in general, but
I think that
>>>>>>>>>>> functionality
>>>>>>>>>>> (returning no messages), is nice to have and
isn't a huge amount
>>>>>>>>>>> of work
>>>>>>>>>>> on
>>>>>>>>>>> our side.
>>>>>>>>>>>
>>>>>>>>>>> Avery
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> To clarify, I was not discussing the possibility
for combine to
>>>>>>>>>>>> return
>>>>>>>>>>>> null. I see why it would be useful, given
that combine returns M,
>>>>>>>>>>>> there's no other way to let combiner ask
not to send any message,
>>>>>>>>>>>> although i agree with Jakob, I also believe
returning null should
>>>>>>>>>>>> be
>>>>>>>>>>>> avoided but only used, roughly, as an init
value for a
>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>> Perhaps, we could, but i'm just thinking
out loud here, let
>>>>>>>>>>>> combine()
>>>>>>>>>>>> return Iterable<M>, basicallly letting
it define what to combine
>>>>>>>>>>>> to
>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful
extension to the
>>>>>>>>>>>> model,
>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>
>>>>>>>>>>>> As far as the size of the messages parameter,
I agree with you
>>>>>>>>>>>> that 0
>>>>>>>>>>>> messages gives nothing to combine and it
would be somehow
>>>>>>>>>>>> awkward, it
>>>>>>>>>>>> was more a matter of synching it with the
other methods getting
>>>>>>>>>>>> the
>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>> Probably, having a more clear javadoc will
do the job here.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jghoman@gmail.com>
>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not a big fan of returning null as
it adds extra complexity
>>>>>>>>>>>>> to the
>>>>>>>>>>>>> calling code (null checks, or not, since
people usually will
>>>>>>>>>>>>> forget
>>>>>>>>>>>>> them).  Avery is correct that combiners
are application
>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>> it conceivable that one would want to
write a combiner that
>>>>>>>>>>>>> returned
>>>>>>>>>>>>> something for an input of no parameters,
ie combining the empty
>>>>>>>>>>>>> list
>>>>>>>>>>>>> doesn't return the empty list?  I imagine
for most combiners,
>>>>>>>>>>>>> combining a single message would result
in that message.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery
Ching<aching@apache.org>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine()
is
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  /**
>>>>>>>>>>>>>>   * Combines message values for
a particular vertex index.
>>>>>>>>>>>>>>   *
>>>>>>>>>>>>>>   * @param vertexIndex Index of
the vertex getting these
>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>   * @param msgList List of the messages
to be combined
>>>>>>>>>>>>>>   * @return Message that is combined
from {@link MsgList} or
>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>   *         message it to be
sent
>>>>>>>>>>>>>>   * @throws IOException
>>>>>>>>>>>>>>   */
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think we are somewhat vague on
what a combiner can return to
>>>>>>>>>>>>>> support
>>>>>>>>>>>>>> various use cases.  A combiner should
be particular to a
>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>> compute() algorithm.  I think it
should be legal to return null
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> combiner, in that case, no message
should be sent to that
>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It seems like it would be an overhead
to call a combiner when
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>> messages.  I can't see a case where
that would be useful.
>>>>>>>>>>>>>>  Perhaps we
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>> change the javadoc to insure that
msgList must contain at least
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>> message
>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella
wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yes, that was my point, I agree
completely with you.
>>>>>>>>>>>>>>> Fixing my test was not the issue,
my question was whether we
>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>> define explicitly the semantics
of this scenario.
>>>>>>>>>>>>>>> Personally, I believe the combiner
should be ready to receive
>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>> messages, as it's the case of
BasicVertex::initialize(),
>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the particular example, I
believe the SimpleSumCombiner is
>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>> It's true that the sum of no
values is 0, but it's also true
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> null return semantics of combine()
is more suitable for this
>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM,
Sebastian
>>>>>>>>>>>>>>> Schelter<ssc@apache.org>
>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think we currently implicitly
assume that there is at least
>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>> element in the Iterable passed
to the combiner. The messaging
>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>> invokes the combiner only
if at least one message for the
>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> However, we should not rely
on implicit implementation
>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>> explicitly specify the semantics
of combiners.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio
Martella wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching
the incoming messages and hit an
>>>>>>>>>>>>>>>>> interesting problem with
the combiner semantics.
>>>>>>>>>>>>>>>>> currently, my code fails
testBspCombiner for the following
>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute()
returns a value even if there
>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>> messages in the iterator
(in this case it returns 0) and for
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> reason the vertices get
activated at each superstep.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> At each superstep, under-the-hood,
I pass the combiner for
>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>> an Iterable, which can
be empty:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     public Iterable<M>
       getMessages(I vertexId) {
>>>>>>>>>>>>>>>>>       Iterable<M>
       messages =
>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>       if (combiner
!= null) {
>>>>>>>>>>>>>>>>>              
M combinedMsg;
>>>>>>>>>>>>>>>>>              
try {
>>>>>>>>>>>>>>>>>              
        combinedMsg =
>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>              
}  catch (IOException e) {
>>>>>>>>>>>>>>>>>              
        throw new RuntimeException("could not
>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>              
}
>>>>>>>>>>>>>>>>>              
if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>              
        List<M>        tmp = new
>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>              
        tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>              
        messages = tmp;
>>>>>>>>>>>>>>>>>              
} else {
>>>>>>>>>>>>>>>>>              
        messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>>>>              
}
>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> the Iterable returned
by this methods is passed to
>>>>>>>>>>>>>>>>> basicVertex.putMessages()
right before the compute().
>>>>>>>>>>>>>>>>> Now, the question is:
who's wrong? The combiner code that
>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> sum of 0 over no values,
or the framework that calls the
>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>    Claudio Martella
>>>>>>>>>>    claudio.martella@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>    Claudio Martella
>>>>>>>    claudio.martella@gmail.com
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>    Claudio Martella
>>>>>    claudio.martella@gmail.com
>>>
>>>
>>>
>>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message