hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Writing output from streaming task without dealing with key/value
Date Wed, 10 Sep 2014 18:47:41 GMT
Examples (the top ones are related to streaming jobs):

http://www.infoq.com/articles/HadoopOutputFormat
http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/
http://stackoverflow.com/questions/12759651/how-to-override-inputformat-and-outputformat-in-hadoop-application

Regards,
Shahab

On Wed, Sep 10, 2014 at 2:28 PM, Dmitry Sivachenko <trtrmitya@gmail.com>
wrote:

>
> On 10 сент. 2014 г., at 22:19, Rich Haase <rdhaase@gmail.com> wrote:
>
> > You can write a custom output format
>
>
> Any clues how can this can be done?
>
>
>
> > , or you can write your mapreduce job in Java and use a NullWritable as
> Susheel recommended.
> >
> > grep (and every other *nix text processing command) I can think of would
> not be limited by a trailing tab character.  It's even quite easy to strip
> away that tab character if you don't want it during the post processing
> steps you want to perform with *nix commands.
>
>
> Problem is that the line itself contains a TAB in the middle, there will
> not be extra trailing TAB at the end.
> So it is not that simple.
> You never know if it is a TAB from the original line or it is extra TAB
> added by TextOutputFormat.
>
> Thanks!

Mime
View raw message