hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: repeat a job for different files
Date Thu, 18 Nov 2010 20:20:02 GMT
On Fri, Nov 19, 2010 at 12:19 AM, maha <maha@umail.ucsb.edu> wrote:
>  If I have only one MR code and pass one file at a time to it, the problem will be in
the FileOutputFormat because it will say output file is already there!

Unless you remove it post-job; kind of like using it as a staging
directory and moving it out at the end of each job.

>  So I had to repeat the same code for MR three times with only difference in the output
directory which is unreasonable because what if I have 100 files :(

Loop with a script, should be easy!

>  I guess another solution would be to know HOW to add an output file into an existing
directory instead of creating a new one (using the FileOutputFormat) ?

Why don't you just fs.move() the output directory upon completion to a
set location, every job's end?

Harsh J

View raw message