hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Value-Only Reduce Output
Date Wed, 04 Feb 2009 05:28:34 GMT
Ooops, you are using streaming., and I am not familar.
As a terrible hack, you could set mapred.textoutputformat.separator to the
empty string, in your configuration.

On Tue, Feb 3, 2009 at 9:26 PM, jason hadoop <jason.hadoop@gmail.com> wrote:

> If you are using the standard TextOutputFormat, and the output collector is
> passed a null for the value, there will not be a trailing tab character
> added to the output line.
> output.collect( key, null );
> Will give you the behavior you are looking for if your configuration is as
> I expect.
> On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <jack@yelp.com> wrote:
>> Hello,
>> I'm interested in a map-reduce flow where I output only values (no keys)
>> in
>> my reduce step.  For example, imagine the canonical word-counting program
>> where I'd like my output to be an unlabeled histogram of counts instead of
>> (word, count) pairs.
>> I'm using HadoopStreaming (specifically, I'm using the dumbo module to run
>> my python scripts).  When I simulate the map reduce using pipes and sort
>> in
>> bash, it works fine.   However, in Hadoop, if I output a value with no
>> tabs,
>> Hadoop appends a trailing "\t", apparently interpreting my output as a
>> (value, "") KV pair.  I'd like to avoid outputing this trailing tab if
>> possible.
>> Is there a command line option that could be use to effect this?  More
>> generally, is there something wrong with outputing arbitrary strings,
>> instead of key-value pairs, in your reduce step?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message