hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Chern <idry...@gmail.com>
Subject Re: Writing output from streaming task without dealing with key/value
Date Wed, 10 Sep 2014 20:47:02 GMT
If you don’t want anything get inserted, just set your output to key only or value only.
TextOutputFormat$LineRecordWriter won’t insert anything unless both values are set:

    public synchronized void write(K key, V value)
      throws IOException {

      boolean nullKey = key == null || key instanceof NullWritable;
      boolean nullValue = value == null || value instanceof NullWritable;
      if (nullKey && nullValue) {
        return;
      }
      if (!nullKey) {
        writeObject(key);
      }
      if (!(nullKey || nullValue)) {
        out.write(keyValueSeparator);
      }
      if (!nullValue) {
        writeObject(value);
      }
      out.write(newline);
    }

On Sep 10, 2014, at 1:37 PM, Dmitry Sivachenko <trtrmitya@gmail.com> wrote:

> 
> On 10 сент. 2014 г., at 22:33, Felix Chern <idryman@gmail.com> wrote:
> 
>> Use ‘tr -s’ to stripe out tabs?
>> 
>> $ echo -e "a\t\t\tb"
>> a			b
>> 
>> $ echo -e "a\t\t\tb" | tr -s "\t"
>> a	b
>> 
> 
> There can be tabs in the input, I want to keep input lines without any modification.
> 
> Actually it is rather standard task: process lines one by one without inserting extra
characters.  There should be standard solution for it IMO.
> 


Mime
View raw message