hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anvesh ragi <annunarc...@gmail.com>
Subject Re: hadoop 2.4.0 streaming generic parser options using TAB as separator
Date Wed, 10 Jun 2015 20:14:35 GMT
That did not work either.

Thanks & Regards,
Anvesh R

On Tue, Jun 9, 2015 at 11:12 PM, Kiran Dangeti <kirandkumar2013@gmail.com>
wrote:

> \bbb
> On Jun 10, 2015 10:58 AM, "anvesh ragi" <annunarcist@gmail.com> wrote:
>
>> Hello all,
>>
>> I know that the tab is default input separator for fields :
>>
>> stream.map.output.field.separator
>> stream.reduce.input.field.separator
>> stream.reduce.output.field.separator
>> mapreduce.textoutputformat.separator
>>
>> but if i try to write the generic parser option :
>>
>> stream.map.output.field.separator=\t (or)
>> stream.map.output.field.separator="\t"
>>
>> to test how hadoop parses white space characters like "\t,\n" when used
>> as separators. I observed that hadoop reads it as \t character but not "
>>      " tab space itself. I checked it by printing each line in reducer
>> (python) as it reads using :
>>
>> sys.stdout.write(str(line))
>>
>> My mapper emits key/value pairs as : key value1 value2
>>
>> using print (key,value1,value2,sep='\t',end='\n') command.
>>
>> So I expected my reducer to read each line as : key value1 value2 too,
>> but instead sys.stdout.write(str(line)) printed :
>>
>> key value1 value2 \\with trailing space
>>
>> From Hadoop streaming - remove trailing tab from reducer output
>> <http://stackoverflow.com/questions/18133290/hadoop-streaming-remove-trailing-tab-from-reducer-output>,
>> I understood that the trailing space is due to
>> mapreduce.textoutputformat.separator not being set and left as default.
>>
>> So, this confirmed my assumption that hadoop considered my total map
>> output :
>>
>> key value1 value2
>>
>> as key and value as empty Text object since it read the separator from
>> stream.map.output.field.separator=\t as "\t" character instead of "" tab
>> space itself.
>>
>> Please help me understand this behavior and how can I use \t as a
>> separator if I want to?
>>
>> Thanks & Regards,
>> Anvesh R
>>
>>

Mime
View raw message