hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bin Wang <binwang...@gmail.com>
Subject Hadoop Streaming Use Non Printing Character as Key/Value separator
Date Thu, 20 Nov 2014 16:28:40 GMT
Hi there,

I am writing a hadoop streaming job where certain columns contain natural
languages. In that case, use '\t' as the default delimiter is not a choice
for me.

Does anything know how to pass a non printing character, like SOH 'start of
header' as the key/value separator?

I tried to pass different versions of that to a hadoop command, which I put
into a shell script.

-D stream.map.output.field.separator=SOH
-D stream.map.output.field.separator=001
-D stream.map.output.field.separator=^A

And now of them is working.  Can anyone help me with that?



View raw message