hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tim robertson" <timrobertson...@gmail.com>
Subject Newbie reducer question
Date Sun, 09 Mar 2008 11:54:32 GMT
Hi all,
I am a day one newbie investigating distributed work for the first time...

I have run through the tutorials with ease (thanks for the nice
documentation) and now have written my first map reduce.

Is it accurate to say that the reduce is repetitively called by the Hadoop
framework until the number of inputs = number of outputs?

I am only running in single server mode at the moment but I have map
outputs:

Football UK
Football UK
Rugby UK
American Football USA
Rugby FR
Football FR

And reduce outputs:

Football UK, FR
Rugby UK, FR
American Football USA

This worked fine.

But when I tried to include the counts in the output, I got some strange
results:

Football UK(2), FR(1)(1)
Rugby UK(1), FR(1)(1)
American Football USA(1)(1)

I think it was because I was just doing String manipulation in the reducer
to produce the counts.

I presume then I need to not use the Text type and actually define a Type
for the Country+Count?

Thanks,

Tim

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message