hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chinni, Ravi" <rchi...@syncsort.com>
Subject RE: Changing default separator for streaming application
Date Thu, 17 Jun 2010 14:02:31 GMT
Thanks. It helped. Not sure if this is documented anywhere on the hadoop


One additional issue I am encountering:

I want the records from the reduce output to be '\r\n' terminated. Even
tough, I am putting a '\r\n' at the end of the value in my reduce script
function, the final output in the file has '\n'. Again it seems that the
framework is replacing '\r\n' by '\n'. Any ideas?





From: Amareshwari Sri Ramadasu [mailto:amarsri@yahoo-inc.com] 
Sent: Thursday, June 17, 2010 12:26 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Changing default separator for streaming application


Final output is written by OutputFormat. By default, TextOutputFormat
will write \t as the key-value separator. You can specify a different
key-value separator for TextOutputFormat by specifying the value for
configuration property "mapred.textoutputformat.separator". Try setting
' ' for the configuration.


On 6/16/10 9:17 PM, "Chinni, Ravi" <rchinni@syncsort.com> wrote:

I am trying to develop a streaming MR application by implementing
korn-shell based mapper and reducer. I want to use 'space - x20' as the
separator between key and value throughout the application.
When invoking the application I specified  -D
stream.map.output.field.separator=" " -D
stream.reduce.output.field.seperator=" " options.
While in the output of my shell script I have a space between key and
value fields, the final output written by the framework to file has a
tab as the separator. It seems that the framework is replacing the space
separator by a tab separator in the output of mapper and reducer
If anyone has ideas on how I can fix this, please share it.
Ravi Chinni


The information contained in this message (including any files
transmitted with this message) may contain proprietary, trade secret or
other  confidential and/or legally privileged information. Any pricing
information contained in this message or in any files transmitted with
this message is always confidential and cannot be shared with any third
parties without prior written approval from Syncsort. This message is
intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the
intended recipient, you are on notice that any use, disclosure, copying
or distribution of this message, in any form, is strictly prohibited. If
you have received this message in error, please immediately notify the
sender and/or Syncsort and destroy all copies of this message in your
possession, custody or control.

View raw message