hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Value-Only Reduce Output
Date Wed, 04 Feb 2009 05:26:31 GMT
If you are using the standard TextOutputFormat, and the output collector is
passed a null for the value, there will not be a trailing tab character
added to the output line.

output.collect( key, null );
Will give you the behavior you are looking for if your configuration is as I

On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <jack@yelp.com> wrote:

> Hello,
> I'm interested in a map-reduce flow where I output only values (no keys) in
> my reduce step.  For example, imagine the canonical word-counting program
> where I'd like my output to be an unlabeled histogram of counts instead of
> (word, count) pairs.
> I'm using HadoopStreaming (specifically, I'm using the dumbo module to run
> my python scripts).  When I simulate the map reduce using pipes and sort in
> bash, it works fine.   However, in Hadoop, if I output a value with no
> tabs,
> Hadoop appends a trailing "\t", apparently interpreting my output as a
> (value, "") KV pair.  I'd like to avoid outputing this trailing tab if
> possible.
> Is there a command line option that could be use to effect this?  More
> generally, is there something wrong with outputing arbitrary strings,
> instead of key-value pairs, in your reduce step?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message