hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Welling <well...@psc.edu>
Subject Ordering of records in output files?
Date Wed, 10 Sep 2008 17:32:12 GMT
Hi folks;
  I have a simple Streaming job where the mapper produces output records
beginning with a 16 character ascii string and passes them to
IdentityReducer.  When I run it, I get the same number of output files
as I have mapred.reduce.tasks .  Each one contains some of the strings,
and within each file the strings are in sorted order.
  But there is no obvious ordering *across* the files.  For example, I
can see where the first few strings in the output went to files 0,1,3,4,
and then back to 0, but none of them ended up in file 2.
  What's the algorithm that determines which strings end up in which
files?  Is there a way I can change it so that sequentially ordered
strings end up in the same file rather than spraying off across all the


View raw message