incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Port to YARN: GIRAPH and HAMA
Date Wed, 14 Sep 2011 21:46:05 GMT
Vinod, thanks for your comments.  I've replied inline.

Avery

On 9/14/11 11:09 AM, Vinod Kumar Vavilapalli wrote:
> Avery,
>
> Some replies inline to the issues you outlined.
>
>> 1)  Giraph runs completely as a MapReduce job on Hadoop today.  This needs
> to be maintained to support our current users, who will not likely move to
> MRv2 for at least a year.
> I think what you need is to support Giraph's graph API for your users, but
> no, not the underlying implementation. (Or are you leaking MapReduce APIs to
> your users?) Sure, you are restricted to the under implementation(Hadoop
> MRV1 or MRV2 whenever it gets used) at any point of time, but what we are
> discussing is _that_ future when the underlying implementation itself also
> moves to MRV2.
I think the takeaway should be that our clients (at Yahoo! and 
elsewhere) are currently using Giraph on MRv1.  While the Giraph API is 
not exposing the underlying infrastructure APIs (i.e. MRv1 and MRv2), we 
still need to support the MRv1 implementation even while we 
begin/complete the port to MRv2.  I imagine that we will need to support 
both MRv1 and MRv2 for a fairly long period of time as the transition to 
MRv2 for a company (i.e. Yahoo!) could take a very long time (i.e. 
anywhere between 8 months to multiple years).  Some of our internal 
clusters at Yahoo! today are still running 0.20.1 for example.
>> 2)  The internals of Giraph are implemented differently than Hama..
> Sure, but only at present. My original question is - given a BSP
> implementation on a YARN cluster, can GiraphV2(BSP based) be simply
> implemented over that or not. If today, GiraphV1 uses (its own) BSP
> implementation over mapreduce APIs on Hadoop MRV1 cluster, I can clearly see
> how GiraphV2 can be using (HAMA's) BSP implemented over YARN APIs.
>
In theory this is true.  However, as mentioned previously, we still have 
users on MRv1 and will need to support it for a long time (i.e. at least 
a year, probably more).   Also I'm fairly certain that during the next 
year, we will have non-BSP based graph processing computing models in 
place as well.  For these reasons, it may not make sense to try to put 
Giraph on top of HAMA even when we are both on MRv2.  It's hard to say 
now as it is early.  Let's visit this at a later time.

>> 3)  If we have various graph processing computing models (BSP based,
> streams or asynchronous, or a combination), then being on Hama brings little
> value for Giraph.
> That future isn't there yet. In any case, I'd bet when you get there, lot of
> what you have now also wouldn't be an out-of-the-box fit.
>
>  From my perspective (a third person POV), this is what I can conclude.
> Giraph's velocity on Hadoop MapReduce may be real the impedence for thinking
> about a possible sharing of the bsp based implementation with HAMAV2. Sure,
> Giraph has other ideas regarding the computation model itself, but that is a
> future that isn't here yet.
>
> I just hope the same velocity isn't an impedance for thinking about the
> next-gen version on top of YARN :) The way I see it, porting Giraph to YARN
> is also a revolution in itself; most, if not all, of the implementation will
> change yet with the API level compatibility. I am still eagerly looking
> forward to the port of Giraph to YARN. May be more digging into Giraph
> internals may help my cause too.
Giraph does appear to be moving with a fast velocity currently, but we 
have a clear intention to run on top of MRv2.  Please see 
https://issues.apache.org/jira/browse/GIRAPH-13.  Obviously, the MRv2 
changes are much better suited for Giraph and we look forward to the day 
when nearly all Hadoop instances are running MRv2.
> If nothing, this discussion atleast helped sharing of some of the ideas
> between the two communities.
>
> Thanks all for putting down in your thoughts.
> +Vinod
>
>
> On Wed, Sep 14, 2011 at 11:46 AM, Thomas Jungblut<
> thomas.jungblut@googlemail.com>  wrote:
>
>>   We are also thinking about other underlying computing models (i.e.
>>> streaming (asynchronous) graph processing - see
>>
>> That is a really cool idea. But I don't think we are going to focus solely
>> on graph computing. We want to enable an interface which can be used for it
>> (straight forward as described in the Pregel Paper), but I think you are
>> really graph experts- so we don't want to compete with each other :D
>> Our asynchronous processing (in my opinion) will just enable the sending of
>> messages within the computation phase. So the BarrierSync is just a little
>> transition to make sure every task is ready and every message has been send.
>> Your Vertex locking is a graph-only feature, this won't be effecting us
>> anyways.
>>
>>
>> Giraph runs completely as a MapReduce job on Hadoop today.
>> Allright.
>>
>> I think our result is the following:
>> We (Apache Hama) are focussing on the YARN implementation of the BSP
>> paradigm.
>> If you want to run Giraph on a real BSP engine later, feel free to put your
>> stuff on top of that.
>> As far as I have seen, there is a 100% backward compatibility of YARN, so
>> your current solution will run on YARN either.
>>
>> Best Regards,
>>
>> Thomas
>>


Mime
View raw message