hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <dnsr...@gmail.com>
Subject Re: Hadoop Streaming: How to parition output into subfolders?
Date Thu, 21 Jan 2016 06:21:12 GMT
Thank you, Rohit!

Any multiple outputs sample code in python?

Rex

On Wed, Jan 20, 2016 at 10:04 PM, rohit sarewar <rohitsarewar@gmail.com>
wrote:

> Hi Rex
>
> Please explore multiple outputs
> <https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html>
> .
>
> Regards
> Rohit Sarewar
>
>
> On Thu, Jan 21, 2016 at 5:13 AM, Rex X <dnsring@gmail.com> wrote:
>
>> Dear all,
>>
>> To be specific, for example, given
>>
>>     hadoop jar hadoop-streaming.jar \
>>       -input myInputDirs \
>>       -output myOutputDir \
>>       -mapper /bin/cat \
>>       -reducer /usr/bin/wc
>>
>> Where myInputDirs has a *dated* subfolder structure of
>>
>>        /input_dir/yyyy/mm/dd/part-*
>>
>> I want myOutputDir has the same *dated* subfolder structure:
>>
>>        /output_dir/yyyy/mm/dd/part-*
>>
>> Guess there should be an option to do this. Can "-partitioner" or any
>> "-D" option make this?
>>
>>
>> Thanks & regards,
>> Rex
>>
>
>

Mime
View raw message