hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject Re: execute mapreduce job on multiple hdfs files
Date Tue, 23 Mar 2010 12:55:54 GMT
Hi Oleg,
you can use FileInputFormat.addInputPath(JobConf, Path) multiple times in your program to
add arbitrary paths. Instead, if you use FileInputFormat.setInputPath, there could be only
one input path.

If you are talking about output, the path you give is an output directory, all the output
files (part-00000, part-00001...) will be generated in that directory.


----- 原始邮件 ----
发件人: Oleg Ruchovets <oruchovets@gmail.com>
收件人: common-user@hadoop.apache.org
发送日期: 2010/3/23 (周二) 6:18:34 上午
主   题: execute mapreduce job on multiple hdfs files

Hi ,
All examples that I found executes mapreduce job on a single file but in my
situation I have more than one.

Suppose I have such folder on HDFS which contains some files:


How can I execute  hadoop mapreduce on file1.txt , file2.txt and file3.txt?

Is it possible to provide to hadoop job folder as parameter and all files
will be produced by mapreduce job?

Thanks In Advance


View raw message