Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 5651 invoked from network); 5 Feb 2009 00:53:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Feb 2009 00:53:45 -0000 Received: (qmail 52176 invoked by uid 500); 5 Feb 2009 00:53:38 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 52131 invoked by uid 500); 5 Feb 2009 00:53:38 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 52116 invoked by uid 99); 5 Feb 2009 00:53:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2009 16:53:38 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jack@yelp.com designates 209.85.198.238 as permitted sender) Received: from [209.85.198.238] (HELO rv-out-0506.google.com) (209.85.198.238) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2009 00:53:32 +0000 Received: by rv-out-0506.google.com with SMTP id k40so2912009rvb.29 for ; Wed, 04 Feb 2009 16:53:11 -0800 (PST) MIME-Version: 1.0 Received: by 10.141.82.20 with SMTP id j20mr531817rvl.67.1233795191478; Wed, 04 Feb 2009 16:53:11 -0800 (PST) In-Reply-To: <314098690902040720u2a0a60c2med3f7323a56647ae@mail.gmail.com> References: <239f2f640902031949n42abfcefs491ad59c4f3721f6@mail.gmail.com> <314098690902032126i43627af6i4785381356d25fb2@mail.gmail.com> <314098690902032128i46a37b6flf31deb47e4220bd3@mail.gmail.com> <314098690902040720u2a0a60c2med3f7323a56647ae@mail.gmail.com> Date: Wed, 4 Feb 2009 16:53:11 -0800 Message-ID: <239f2f640902041653t5f41dde1uaecebe4578d92b9e@mail.gmail.com> Subject: Re: Value-Only Reduce Output From: Jack Stahl To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd156061d19a204622154e2 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd156061d19a204622154e2 Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable My (0.18.2) reduce src looks like this: write(key); clientOut_.write('\t'); write(val); clientOut_.write('\n'); which explains why avoiding the trailing tab is unavoidable. Thanks for your help, though, Jason! 2009/2/4 jason hadoop > For your reduce, the parameter is stream.reduce.input.field.separator, if > you are supplying a reduce class and I believe the output format is > TextOutputFormat... > > It looks like you have tried the map parameter for the separator, not the > reduce parameter. > > From 0.19.0 PipeReducer: > configure: > reduceOutFieldSeparator =3D > job_.get("stream.reduce.output.field.separator", "\t").getBytes("UTF-8"); > reduceInputFieldSeparator =3D > job_.get("stream.reduce.input.field.separator", "\t").getBytes("UTF-8"); > this.numOfReduceOutputKeyFields =3D > job_.getInt("stream.num.reduce.output.key.fields", 1); > > getInputSeparator: > byte[] getInputSeparator() { > return reduceInputFieldSeparator; > } > > reduce: > write(key); > * clientOut_.write(getInputSeparator());* > write(val); > clientOut_.write('\n'); > } else { > // "identity reduce" > * output.collect(key, val);* > } > > > On Wed, Feb 4, 2009 at 6:15 AM, Rasit OZDAS wrote: > > > I tried it myself, it doesn't work. > > I've also tried stream.map.output.field.separator and > > map.output.key.field.separator parameters for this purpose, they > > don't work either. When hadoop sees empty string, it takes default tab > > character instead. > > > > Rasit > > > > 2009/2/4 jason hadoop > > > > > > Ooops, you are using streaming., and I am not familar. > > > As a terrible hack, you could set mapred.textoutputformat.separator t= o > > the > > > empty string, in your configuration. > > > > > > On Tue, Feb 3, 2009 at 9:26 PM, jason hadoop > > wrote: > > > > > > > If you are using the standard TextOutputFormat, and the output > > collector is > > > > passed a null for the value, there will not be a trailing tab > character > > > > added to the output line. > > > > > > > > output.collect( key, null ); > > > > Will give you the behavior you are looking for if your configuratio= n > is > > as > > > > I expect. > > > > > > > > > > > > On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl wrote: > > > > > > > >> Hello, > > > >> > > > >> I'm interested in a map-reduce flow where I output only values (no > > keys) > > > >> in > > > >> my reduce step. For example, imagine the canonical word-counting > > program > > > >> where I'd like my output to be an unlabeled histogram of counts > > instead of > > > >> (word, count) pairs. > > > >> > > > >> I'm using HadoopStreaming (specifically, I'm using the dumbo modul= e > to > > run > > > >> my python scripts). When I simulate the map reduce using pipes an= d > > sort > > > >> in > > > >> bash, it works fine. However, in Hadoop, if I output a value wit= h > no > > > >> tabs, > > > >> Hadoop appends a trailing "\t", apparently interpreting my output = as > a > > > >> (value, "") KV pair. I'd like to avoid outputing this trailing ta= b > if > > > >> possible. > > > >> > > > >> Is there a command line option that could be use to effect this? > More > > > >> generally, is there something wrong with outputing arbitrary > strings, > > > >> instead of key-value pairs, in your reduce step? > > > >> > > > > > > > > > > > > > > > > -- > > M. Ra=BAit =D6ZDA=AA > > > --000e0cd156061d19a204622154e2--