hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rohit sarewar <rohitsare...@gmail.com>
Subject Re: Hadoop Streaming: How to parition output into subfolders?
Date Thu, 21 Jan 2016 06:04:29 GMT
Hi Rex

Please explore multiple outputs
<https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html>
.

Regards
Rohit Sarewar


On Thu, Jan 21, 2016 at 5:13 AM, Rex X <dnsring@gmail.com> wrote:

> Dear all,
>
> To be specific, for example, given
>
>     hadoop jar hadoop-streaming.jar \
>       -input myInputDirs \
>       -output myOutputDir \
>       -mapper /bin/cat \
>       -reducer /usr/bin/wc
>
> Where myInputDirs has a *dated* subfolder structure of
>
>        /input_dir/yyyy/mm/dd/part-*
>
> I want myOutputDir has the same *dated* subfolder structure:
>
>        /output_dir/yyyy/mm/dd/part-*
>
> Guess there should be an option to do this. Can "-partitioner" or any "-D"
> option make this?
>
>
> Thanks & regards,
> Rex
>

Mime
View raw message