hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason <urg...@gmail.com>
Subject Re: Map-Reduce Applicability With All-In Memory Data
Date Thu, 09 Dec 2010 18:45:42 GMT
Take a look at NLineInputFormat. You might want to use it in combination with DistributedCache.


Sent from my iPhone

On Dec 9, 2010, at 5:02 AM, Narinder Kumar <nkumar@inphina.com> wrote:

> Hi All,
> 
> We have a problem in hand which we would like to solve using Distributed and Parallel
Processing. 
> 
> Problem context : We have a Map (Entity, Value). The entity can have a parent which in
turn will have its parent and so on till we reach the head. I have to traverse this tree and
do some calculations at every step using the value of the Map. The final output will again
be a map containing the aggregated results of the computation (Entity, Computed Value). The
tree structure can be quite deep and we have a huge number of entries in Map to process before
coming to the final result. Processing them sequentially takes quite long time. We were thinking
of using Map-Reduce to split the computation across multiple nodes in a Hadoop Cluster and
then aggregate the results to get the final output.
> 
> Having a quick read at the documentation and the samples, I see that both Mapper and
Reducer work with implementations of InputFormat and OutPutFormat respectively. Most of the
implementations appeared to me to be either File or DB based. Do we have some input-output
format which directly takes/updates things from/into Memory ? or I need to provide my own
Custom Input/Output Format and Record Reader/Writer implementations for the purpose ?
> 
> Based upon your experiences, do you think whether Map-Reduce is the appropriate platform
for these kind of scenarios or we should think of it more for huge File based data only ?
> 
> Best Regards
> Narinder

Mime
View raw message