Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 27BC411285 for ; Wed, 10 Sep 2014 20:47:40 +0000 (UTC) Received: (qmail 68201 invoked by uid 500); 10 Sep 2014 20:47:34 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 68089 invoked by uid 500); 10 Sep 2014 20:47:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 67274 invoked by uid 99); 10 Sep 2014 20:47:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 20:47:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of idryman@gmail.com designates 209.85.220.47 as permitted sender) Received: from [209.85.220.47] (HELO mail-pa0-f47.google.com) (209.85.220.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 20:47:27 +0000 Received: by mail-pa0-f47.google.com with SMTP id ey11so10030688pad.6 for ; Wed, 10 Sep 2014 13:47:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:message-id:mime-version:subject:date:references :to:in-reply-to; bh=YEhdcula5H4q96x3Oio6fq81ULVKPb6Qsnq1s20VGKQ=; b=kCYEn4ZcGqLCT2G5x0DmEI47vwV8J9xen12FEHPNAk9UhVXudMhoeojrW6F8T1fn8S dFFWQ1ZQQKflakoHfdouToqOIOJWK3m7unubpL4sgRY6m16jOC3eaV1i0X0AdMhwV3be JbunubFXGOeeEWjdlrjBRv5o3ZPByDwtTh4L0wBQe3QVtkxxN3jdCJnq66nu2g7Ww5nf AJR/UISdVd6uJ9EaWCK1ltrrE4g4sbkZf2hm3Hcu4NXlBF1F0OWawve9jON3jmtZ0Mns /iIkLSbQKoqCuq3Vemachkyv6TYqZ0oEg8+8wCglmKujfxY74EewMLLjkYY2wWnzqEaF sT+A== X-Received: by 10.70.96.102 with SMTP id dr6mr70526915pdb.86.1410382027272; Wed, 10 Sep 2014 13:47:07 -0700 (PDT) Received: from mbp-109-op-22.corp.openx.com ([206.169.198.1]) by mx.google.com with ESMTPSA id o2sm15455175pdk.87.2014.09.10.13.47.04 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 10 Sep 2014 13:47:05 -0700 (PDT) From: Felix Chern Content-Type: multipart/alternative; boundary="Apple-Mail=_DE0D1131-7CAC-4147-B619-30DF7527D61E" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Writing output from streaming task without dealing with key/value Date: Wed, 10 Sep 2014 13:47:02 -0700 References: <290DD281-B0E7-4AD2-84C3-A981907E2C7B@gmail.com> <38961E6A-B26F-4696-9714-26E28D127AEC@gmail.com> <0B1B4E1D-246D-4351-99FA-2D4CA5C34468@gmail.com> To: user@hadoop.apache.org In-Reply-To: <0B1B4E1D-246D-4351-99FA-2D4CA5C34468@gmail.com> X-Mailer: Apple Mail (2.1878.6) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_DE0D1131-7CAC-4147-B619-30DF7527D61E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 If you don=E2=80=99t want anything get inserted, just set your output to = key only or value only. TextOutputFormat$LineRecordWriter won=E2=80=99t insert anything unless = both values are set: public synchronized void write(K key, V value) throws IOException { boolean nullKey =3D key =3D=3D null || key instanceof = NullWritable; boolean nullValue =3D value =3D=3D null || value instanceof = NullWritable; if (nullKey && nullValue) { return; } if (!nullKey) { writeObject(key); } if (!(nullKey || nullValue)) { out.write(keyValueSeparator); } if (!nullValue) { writeObject(value); } out.write(newline); } On Sep 10, 2014, at 1:37 PM, Dmitry Sivachenko = wrote: >=20 > On 10 =D1=81=D0=B5=D0=BD=D1=82. 2014 =D0=B3., at 22:33, Felix Chern = wrote: >=20 >> Use =E2=80=98tr -s=E2=80=99 to stripe out tabs? >>=20 >> $ echo -e "a\t\t\tb" >> a b >>=20 >> $ echo -e "a\t\t\tb" | tr -s "\t" >> a b >>=20 >=20 > There can be tabs in the input, I want to keep input lines without any = modification. >=20 > Actually it is rather standard task: process lines one by one without = inserting extra characters. There should be standard solution for it = IMO. >=20 --Apple-Mail=_DE0D1131-7CAC-4147-B619-30DF7527D61E Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 If you = don=E2=80=99t want anything get inserted, just set your output to key = only or value only.
TextOutputFormat$LineRecordWriter won=E2=80=99t = insert anything unless both values are = set:

    public synchronized void = write(K key, V value)
      throws IOException {

  =     boolean nullKey =3D = key =3D=3D null || key instanceof NullWritable;
  =     boolean nullValue =3D = value =3D=3D null || value instanceof NullWritable;
  =     if (nullKey = && nullValue) {
        return;
      }
      if (!nullKey) {
        = writeObject(key);
      }
      if (!(nullKey || nullValue)) {
      =   out.write(keyValueSeparator);
      }
      if (!nullValue) {
  =       writeObject(value);
      }
  =     out.write(newline);
    = }

On Sep 10, 2014, at 1:37 PM, Dmitry = Sivachenko <trtrmitya@gmail.com> = wrote:


On 10 =D1=81=D0=B5=D0=BD=D1=82. 2014 =D0=B3., at = 22:33, Felix Chern <idryman@gmail.com> = wrote:

Use =E2=80=98tr -s=E2=80=99 to = stripe out tabs?

$ echo -e "a\t\t\tb"
a b

= $ echo -e "a\t\t\tb" | tr -s "\t"
a b


There = can be tabs in the input, I want to keep input lines without any = modification.

Actually it is rather standard task: process lines = one by one without inserting extra characters.  There should be = standard solution for it = IMO.


= --Apple-Mail=_DE0D1131-7CAC-4147-B619-30DF7527D61E--