hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Holmes" <grep.a...@gmail.com>
Subject Question about distributed sort
Date Fri, 22 Aug 2008 23:00:09 GMT

For a given input key, K, in a reduce task, does Hadoop guarantee that
all mapper-emitted values for key K are available in the iterator?  Is
it possible that multiple reduce tasks can receive the same key?

Or to phrase the question in another way, for a single map-reduce job,
where you have multiple mapper and multiple reducer tasks, is there a
possibility that the same key appears in multiple reduce output files
(assuming the reducer only emits a single output K,V pair, where the
output K is identical to the input K)?

Any assistance would be greatly appreciated.


View raw message