hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Sivachenko <trtrmi...@gmail.com>
Subject Re: Writing output from streaming task without dealing with key/value
Date Wed, 10 Sep 2014 19:34:59 GMT


> 10 сент. 2014 г., в 22:47, Shahab Yunus <shahab.yunus@gmail.com> написал(а):
> 
> Examples (the top ones are related to streaming jobs):
> 
> http://www.infoq.com/articles/HadoopOutputFormat
> http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/
> http://stackoverflow.com/questions/12759651/how-to-override-inputformat-and-outputformat-in-hadoop-application
> 


Thanks for the links.  Problem is that in RecordWriter() I get two parameters: key and value.
If one of them is empty I have no way to tell if I should output the delimiter (because it
was present in the original line) or not.

What is the proper way to workaround that isuue?


> Regards,
> Shahab
> 
>> On Wed, Sep 10, 2014 at 2:28 PM, Dmitry Sivachenko <trtrmitya@gmail.com> wrote:
>> 
>> On 10 сент. 2014 г., at 22:19, Rich Haase <rdhaase@gmail.com> wrote:
>> 
>> > You can write a custom output format
>> 
>> 
>> Any clues how can this can be done?
>> 
>> 
>> 
>> > , or you can write your mapreduce job in Java and use a NullWritable as Susheel
recommended.
>> >
>> > grep (and every other *nix text processing command) I can think of would not
be limited by a trailing tab character.  It's even quite easy to strip away that tab character
if you don't want it during the post processing steps you want to perform with *nix commands.
>> 
>> 
>> Problem is that the line itself contains a TAB in the middle, there will not be extra
trailing TAB at the end.
>> So it is not that simple.
>> You never know if it is a TAB from the original line or it is extra TAB added by
TextOutputFormat.
>> 
>> Thanks!
> 

Mime
View raw message