hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranau <alex.barano...@gmail.com>
Subject Re: repeat a job for different files
Date Thu, 18 Nov 2010 06:11:39 GMT
In case you need to process the files separately, use one MR job for each
file. You can add a single file as input. I believe you'll need to iterate
over all files in input dir and start job instance for each file. You can do
this in java code or in script or... depending on your case.

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, Nov 17, 2010 at 10:36 PM, maha <maha@umail.ucsb.edu> wrote:

> Hi,
>
>   When I set my inputFileFormat to take an input directory with three files
> in, the job is processed on all three and the output is one containing the
> result from all of them.
>
> Instead I want the job to be repeated separately for each inputFile and
> hence a different output.
>
>    Eg.
>
>         wordCount(input)   where input/ file1.txt  file2.txt ... fileN.txt
>
>        This happens:   output/    outputFile.txt will contain all words
> from all the files along with their counts.
>
>        I want: output /   outputFile1.txt  for file1.txt words    ,
> ........ , outputFileN.txt for fileN.txt
>
>
>       Thanks,
>           Maha
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message