hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Sivachenko <trtrmi...@gmail.com>
Subject Re: Writing output from streaming task without dealing with key/value
Date Wed, 10 Sep 2014 20:56:04 GMT
On 11 сент. 2014 г., at 0:47, Felix Chern <idryman@gmail.com> wrote:

> If you don’t want anything get inserted, just set your output to key only or value
only.
> TextOutputFormat$LineRecordWriter won’t insert anything unless both values are set:


If I output value only, for instance, and my line contains TAB then everything before TAB
will be lost?
If I output key only, and my line contains TAB then everything after TAB will be lost?


> 
>     public synchronized void write(K key, V value)
>       throws IOException {
> 
>       boolean nullKey = key == null || key instanceof NullWritable;
>       boolean nullValue = value == null || value instanceof NullWritable;
>       if (nullKey && nullValue) {
>         return;
>       }
>       if (!nullKey) {
>         writeObject(key);
>       }
>       if (!(nullKey || nullValue)) {
>         out.write(keyValueSeparator);
>       }
>       if (!nullValue) {
>         writeObject(value);
>       }
>       out.write(newline);
>     }
> 
> On Sep 10, 2014, at 1:37 PM, Dmitry Sivachenko <trtrmitya@gmail.com> wrote:
> 
>> 
>> On 10 сент. 2014 г., at 22:33, Felix Chern <idryman@gmail.com> wrote:
>> 
>>> Use ‘tr -s’ to stripe out tabs?
>>> 
>>> $ echo -e "a\t\t\tb"
>>> a			b
>>> 
>>> $ echo -e "a\t\t\tb" | tr -s "\t"
>>> a	b
>>> 
>> 
>> There can be tabs in the input, I want to keep input lines without any modification.
>> 
>> Actually it is rather standard task: process lines one by one without inserting extra
characters.  There should be standard solution for it IMO.
>> 
> 


Mime
View raw message