hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaliy Semochkin <vitaliy...@gmail.com>
Subject what output types should Combiner use?
Date Sun, 25 Jul 2010 16:17:15 GMT

Am I right that combiners are supposed to return key/value types that
reducers expect as an input?

Lets say I have map reduce operation to calculate number of different ip
that visited a resource

i have log
name1 ip1
name2 ip2
name1 ip2

map produces pairs - (Text resourceName,Text ip)

reducer produces pairs (Text resourceName, IntWritable numberOfVisists)

Am I right that combiner should return (Text resourceName, Text ip) pairs

if so what can be optimized in combiner beside removing repeated ips (if
removing will give any benefit at all).

Thanks in advance,
Vitaliy S

View raw message