Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1C9318476 for ; Wed, 10 Jun 2015 06:13:22 +0000 (UTC) Received: (qmail 5151 invoked by uid 500); 10 Jun 2015 06:13:17 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 5028 invoked by uid 500); 10 Jun 2015 06:13:17 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5018 invoked by uid 99); 10 Jun 2015 06:13:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jun 2015 06:13:17 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kirandkumar2013@gmail.com designates 209.85.213.180 as permitted sender) Received: from [209.85.213.180] (HELO mail-ig0-f180.google.com) (209.85.213.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jun 2015 06:11:02 +0000 Received: by igbhj9 with SMTP id hj9so28491450igb.1 for ; Tue, 09 Jun 2015 23:12:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OvULxHuYuxQONdKuFF2pi/AgT87SSRHCR8Cf469Zu4w=; b=eNiOBpd+fpidQQCJi2iEkjPVkXhqUgCI+/XsrSV495e4yz7dxai6X/SLnBz9Zvz1/t G/K4qCM0aDrbnP0KHzpOMWXN1f8odIoQPI495fjZWl+NW7Jz2v42tde7qcFjftD9nstn 7gmuF5wtfjvfclruORkO0gz8airqE+RJTKSBgqCNacWIOQZNDPt/Gx9WJ5W39vjgmvDT t0o/+ePTnaZV/bzxvuYKDRsmaJ8qzoffCw1wECgVlxlGQKDA8bF1JQf/pd7EgHHJOd1v q9eululb4QMp6zj8vMLuQfNpbL3n8VCcq8S7WA3N28xwfE2GKLPEZTD9rCZwEMCJIJoJ JL7g== MIME-Version: 1.0 X-Received: by 10.107.157.205 with SMTP id g196mr2074011ioe.57.1433916769697; Tue, 09 Jun 2015 23:12:49 -0700 (PDT) Received: by 10.107.50.17 with HTTP; Tue, 9 Jun 2015 23:12:49 -0700 (PDT) Received: by 10.107.50.17 with HTTP; Tue, 9 Jun 2015 23:12:49 -0700 (PDT) In-Reply-To: References: Date: Wed, 10 Jun 2015 11:42:49 +0530 Message-ID: Subject: Re: hadoop 2.4.0 streaming generic parser options using TAB as separator From: Kiran Dangeti To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a1140a562b20632051823c3b8 X-Virus-Checked: Checked by ClamAV on apache.org --001a1140a562b20632051823c3b8 Content-Type: text/plain; charset=UTF-8 \bbb On Jun 10, 2015 10:58 AM, "anvesh ragi" wrote: > Hello all, > > I know that the tab is default input separator for fields : > > stream.map.output.field.separator > stream.reduce.input.field.separator > stream.reduce.output.field.separator > mapreduce.textoutputformat.separator > > but if i try to write the generic parser option : > > stream.map.output.field.separator=\t (or) > stream.map.output.field.separator="\t" > > to test how hadoop parses white space characters like "\t,\n" when used as > separators. I observed that hadoop reads it as \t character but not " > " tab space itself. I checked it by printing each line in reducer (python) > as it reads using : > > sys.stdout.write(str(line)) > > My mapper emits key/value pairs as : key value1 value2 > > using print (key,value1,value2,sep='\t',end='\n') command. > > So I expected my reducer to read each line as : key value1 value2 too, > but instead sys.stdout.write(str(line)) printed : > > key value1 value2 \\with trailing space > > From Hadoop streaming - remove trailing tab from reducer output > , > I understood that the trailing space is due to > mapreduce.textoutputformat.separator not being set and left as default. > > So, this confirmed my assumption that hadoop considered my total map > output : > > key value1 value2 > > as key and value as empty Text object since it read the separator from > stream.map.output.field.separator=\t as "\t" character instead of "" tab > space itself. > > Please help me understand this behavior and how can I use \t as a > separator if I want to? > > Thanks & Regards, > Anvesh R > > --001a1140a562b20632051823c3b8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

\bbb

On Jun 10, 2015 10:58 AM, "anvesh ragi"= ; <annunarcist@gmail.com>= ; wrote:

Hello = all,

I know that the tab is default input separator for fi= elds :

stream.map.output.field.separator
stream.reduce.input.field.separator
stream.reduce.output.field.separator
mapreduce.textoutputformat.separator

but if i = try to write the generic parser option :

=
stream.map=
.=
o=
utput.field.separator=3D\t (or) =20
stream.map.output.field.separator=3D"\t"

to test how hadoop parses white space characters like "\t,\n" wh= en used as separators. I observed that hadoop reads it as \t character but = not " =C2=A0 =C2=A0 =C2=A0 =C2=A0" tab space itself. I checked it= by printing each line in reducer (python) as it reads using :

sys.stdout.write(str(line))

My mapp= er emits key/value pairs as :=C2=A0key value1 = value2

using=C2=A0print (key,value1,v= alue2,sep=3D'\t',end=3D'\n')=C2=A0command.

So I expected my reducer to read each line as :=C2=A0= key value1 value2=C2=A0too, but instead=C2=A0sys.stdout.write(str(line))=C2=A0printed :

key value1 value2 \\with trailing space

From=C2=A0Hadoop streaming - remove trailing tab from reduce= r output, I understood that the trailing space is due to=C2=A0mapreduce.textoutputformat.separator=C2=A0not being set and lef= t as default.

So, this confirmed my assumption that hadoop= considered my total map output :

key = value1 value2

as key and value as empty Te= xt object since it read the separator from=C2=A0stream.map.ou= tput.field.separator=3D\t=C2=A0as "\t" character instead o= f "" tab space itself.

Plea= se help me understand this behavior and how can I use \t as a separator if = I want to?=C2=A0


Thanks & Regards,=
Anvesh R

--001a1140a562b20632051823c3b8--