hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Stahl <j...@yelp.com>
Subject Value-Only Reduce Output
Date Wed, 04 Feb 2009 03:49:38 GMT
Hello,

I'm interested in a map-reduce flow where I output only values (no keys) in
my reduce step.  For example, imagine the canonical word-counting program
where I'd like my output to be an unlabeled histogram of counts instead of
(word, count) pairs.

I'm using HadoopStreaming (specifically, I'm using the dumbo module to run
my python scripts).  When I simulate the map reduce using pipes and sort in
bash, it works fine.   However, in Hadoop, if I output a value with no tabs,
Hadoop appends a trailing "\t", apparently interpreting my output as a
(value, "") KV pair.  I'd like to avoid outputing this trailing tab if
possible.

Is there a command line option that could be use to effect this?  More
generally, is there something wrong with outputing arbitrary strings,
instead of key-value pairs, in your reduce step?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message