hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <lordjoe2...@gmail.com>
Subject When to use a combiner?
Date Tue, 24 Jan 2012 17:33:55 GMT
In working a sample issue I used a combiner - I noticed that the Combiner
output records were 90% of the Combiner Input records and
when looking at the data found relatively few duplicated keys. This raises
the question of what fraction of duplicate keys makes it reasonable to
use a combiner - If every key is unique I presume that using a combiner
will waste time and resources - especially if the data is large but
what fraction of duplicated keys is needed to justify a combiner??

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message