hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Chern <idry...@gmail.com>
Subject Re: Writing output from streaming task without dealing with key/value
Date Wed, 10 Sep 2014 18:33:59 GMT
Use ‘tr -s’ to stripe out tabs?

 $ echo -e "a\t\t\tb"
a			b

 $ echo -e "a\t\t\tb" | tr -s "\t"
a	b


On Sep 10, 2014, at 11:28 AM, Dmitry Sivachenko <trtrmitya@gmail.com> wrote:

> 
> On 10 сент. 2014 г., at 22:19, Rich Haase <rdhaase@gmail.com> wrote:
> 
>> You can write a custom output format
> 
> 
> Any clues how can this can be done?
> 
> 
> 
>> , or you can write your mapreduce job in Java and use a NullWritable as Susheel recommended.
 
>> 
>> grep (and every other *nix text processing command) I can think of would not be limited
by a trailing tab character.  It's even quite easy to strip away that tab character if you
don't want it during the post processing steps you want to perform with *nix commands. 
> 
> 
> Problem is that the line itself contains a TAB in the middle, there will not be extra
trailing TAB at the end.
> So it is not that simple.
> You never know if it is a TAB from the original line or it is extra TAB added by TextOutputFormat.
> 
> Thanks!


Mime
View raw message