hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Reduce input records >> Map output records
Date Fri, 14 Aug 2009 06:52:44 GMT
Hey all,

Has anyone seen behavior where the number of reduce input records is
significantly larger than the number of map output records? There's no
combiner involved in the job at hand, and it's not particularly large (250GB
in, about the same output). The numbers on one example job are:
2,202,290,092 map input records, 2,198,215,987 map output records,
2,200,081,377 reduce input records. The job in question had no failures or
speculative task attempts killed. Running 0.18.3 on JVM 1.6.0u14.

Anyone have any thoughts? Could a broken comparator trip up the merge in
such a way that it would invent records? I searched JIRA and svn logs but
nothing caught my eye. If no one has seen this before I'll keep digging and
certainly open a JIRA if I can find some more useful data.


View raw message