hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tiago Veloso <ti.vel...@gmail.com>
Subject Re: Chaining M/R Jobs
Date Mon, 26 Apr 2010 19:10:44 GMT
On Apr 26, 2010, at 7:39 PM, Xavier Stevens wrote:

> I don't usually bother renaming the files.  If you know you want all of
> the files, you just iterate over the files in the output directory from
> the previous job.  And then add those to distributed cache.  If the data
> is fairly small you can set the number of reducers to 1 on the previous
> step as well.

And how do I Iterate on a directory? Could you give me a sample code?

If relevant I am using hadoop 0.20.2.

Tiago Veloso

View raw message