hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sriramadasu <amar...@yahoo-inc.com>
Subject Re: Optimized way
Date Fri, 05 Dec 2008 03:30:09 GMT
Hi Aayush,
 Do you want one map to run one command? You can give input file 
consisting of lines of <file> <outputfile>. Use NLineInputFormat which 
splits N lines of input as one split. i.e gives N lines to one map for 
processing. By default, N is one. Then your map can just run the shell 
command on input line. Will this optimize your need?
More details @
http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
Thanks,
Amareshwari
Aayush Garg wrote:
> Hi,
>
> I am having a 5 node cluster for hadoop usage. All nodes are multi-core.
> I am running a shell command in Map function of my program and this shell
> command takes one file as an input. Many of such files are copied in the
> HDFS.
>
> So in summary map function will run a command like ./run <file1>
> <outputfile1>
>
> Could you please suggest the optimized way to do this..like if I can use
> multi core processing of nodes and many of such maps in parallel.
>
> Thanks,
> Aayush
>
>   


Mime
View raw message