hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonidas Fegaras <fega...@cse.uta.edu>
Subject Re: Implementing Hadoop map-reduce on Hama
Date Thu, 11 Oct 2012 16:37:24 GMT
Hi Apurv,
Allow me to disagree. For complex workflows and repetitive map-reduce  
jobs, such as PageRank,
a Hama implementation of map-reduce would be far superior. I haven't  
looked at your code yet but
I hope it does not use HDFS or memory for I/O for each map-reduce job.  
It must use streaming to pass
results across map-reduce jobs. If you are currently doing this, there  
is no point for me to step in.
But I really think that this would be very useful in practice, not  
just for demonstration.
Best regards,
Leonidas Fegaras

On Oct 11, 2012, at 11:04 AM, Apurv Verma wrote:

> Hey Leonidas,
> Glad that you are thinking about it. IMO it would be good to have  
> such a
> functionality for demonstration purposes. But only for demonstration
> purposes. Its best to let hadoop do what's its best at and hama do  
> what its
> best at. ;) Suraj and I had tried hands on it in the past. Please  
> see [0]
> and [1]. Also I was working on such a module on my github account.  
> But I
> couldn't find much time. Basically its mostly the way you have  
> expressed in
> your last mail. Do you have a github account, I have already done  
> some work
> on github, I could share work it with you and divide the work if you  
> want?
> [0]
> http://code.google.com/p/anahad/source/browse/trunk/src/main/java/org/anahata/bsp/WordCount.java
>     My POC of the Wordcount example on Hama. Super dirty and not  
> generic
> but works.
> [1] https://github.com/ssmenon/hama
> Suraj's in memory implementation.
> Let me know what you think
> --
> Regards,
> Apurv Verma
> On Thu, Oct 11, 2012 at 8:45 PM, Leonidas Fegaras  
> <fegaras@cse.uta.edu>wrote:
>> I have seen some emails in this mailing list asking questions, such  
>> as:
>> I have an X algorithm running on Hadoop map-reduce. Is it suitable  
>> for
>> Hama?
>> I think it would be great if we had a good implementation of the
>> Hadoop map-reduce classes on Hama. Other distributed main-memory
>> systems have already done so. See:
>> M3R (http://vldb.org/pvldb/vol5/ 
>> **p1736_avrahamshinnar_vldb2012.**pdf<http://vldb.org/pvldb/vol5/p1736_avrahamshinnar_vldb2012.pdf

>> >)
>> and Spark.
>> It is actually easier than you think. I have done something similar
>> for my query system, MRQL. What we need is to reimplement
>> org.apache.hadoop.mapreduce.**Job to execute one superstep for each
>> map-reduce job. Then a Hadoop map-reduce program that may contain
>> complex workflows and/or loops of map-reduce jobs would need minor
>> changes to run on Hama as a single BSPJob. Obviously, to implement
>> map-reduce in Hama, the mapper output can be shuffled to reducers
>> based on key by sending messages using hashing:
>> peer.getPeerName(key.**hashValue() % peer.getNumPeers())
>> Then the reducer superstep groups the data by the key in memory and
>> applies the reducer method. To handle input/intermediate data, we can
>> use a mapping from path_name to (count,vector) at each node. The
>> path_name is the path name of some input or intermediate HDFS file,
>> vector contains the data partition from this file assigned to the  
>> node, and
>> count is the max number of times we can scan this vector (after count
>> times, the vector is garbage-collected). The special case where
>> count=1 can be implemented using a stream (a Java inner class that
>> implements a stream Iterator). Given that the map-reduce Job output
>> is rarely accessed more than once, the translation of most map-reduce
>> jobs to Hama will not require any data to be stored in memory other
>> than those used by the map-reduce jobs. One exception is the graph
>> data that need to persist in memory across all jobs (then  
>> count=maxint).
>> Based on my experience with MRQL, the implementation of these ideas
>> may need up to 1K lines of Java code. Let me know if you are  
>> interested.
>> Leonidas Fegaras

View raw message