hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Query regarding Hadoop version 0.20.203
Date Mon, 05 Mar 2012 10:58:45 GMT
Piyush,

On Mon, Mar 5, 2012 at 3:16 PM, Piyush Kansal <piyush.kansal@gmail.com> wrote:
> Ques 1:
> ======
> I have a HDFS directory which contains the o/p files of reducer. I want to
> read all the part-r-* files present in this directory.
>
> I have already tried following options as follows but no luck:
> - FileSystem.listStatus
>
> Can you please suggest how can I do it?

Iterate over the FileStatus objects returned by listStatus (they'll be
in the right order), and read them one by one. Does that not work for
you?

> Ques 2:
> ======
> Since MultipleOutputs/MultipleOutputFormat is not there in 0.20.203, so can
> we achieve the same functionality provided by these classes.

Upgrade to either 1.0.1 to get MultipleOutputs for new API (Was only
recently released with that backport from 0.21), or to any alternative
distributions that offer it back-ported, or perhaps switch back to
using the stable (old) API which is still recommended to use for MR.

Alternatively, read
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F

-- 
Harsh J

Mime
View raw message