hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Namikaze Minato <lloydsen...@gmail.com>
Subject Re: Hadoop Streaming: How to parition output into subfolders?
Date Thu, 21 Jan 2016 09:41:19 GMT
Hi Rex X,

We are using the -outputFormat <classname> option of hadoop-streaming.
Here is the detail: http://www.infoq.com/articles/HadoopOutputFormat

Regards,
Camusensei

On 21 January 2016 at 07:21, Rex X <dnsring@gmail.com> wrote:
> Thank you, Rohit!
>
> Any multiple outputs sample code in python?
>
> Rex
>
>
> On Wed, Jan 20, 2016 at 10:04 PM, rohit sarewar <rohitsarewar@gmail.com>
> wrote:
>>
>> Hi Rex
>>
>> Please explore multiple outputs.
>>
>> Regards
>> Rohit Sarewar
>>
>>
>> On Thu, Jan 21, 2016 at 5:13 AM, Rex X <dnsring@gmail.com> wrote:
>>>
>>> Dear all,
>>>
>>> To be specific, for example, given
>>>
>>>     hadoop jar hadoop-streaming.jar \
>>>       -input myInputDirs \
>>>       -output myOutputDir \
>>>       -mapper /bin/cat \
>>>       -reducer /usr/bin/wc
>>>
>>> Where myInputDirs has a dated subfolder structure of
>>>
>>>        /input_dir/yyyy/mm/dd/part-*
>>>
>>> I want myOutputDir has the same dated subfolder structure:
>>>
>>>        /output_dir/yyyy/mm/dd/part-*
>>>
>>> Guess there should be an option to do this. Can "-partitioner" or any
>>> "-D" option make this?
>>>
>>>
>>> Thanks & regards,
>>> Rex
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message