hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jamal sasha <jamalsha...@gmail.com>
Subject Basic hadoop MR question
Date Tue, 02 Apr 2013 21:38:07 GMT
 I have a quick question. I am trying to write MR code using python.
In the word count example:

The reducer..
Why cant in the reducer I can declare a ditionary (hashmap) whose key is
word and value is a list of count (1's here)

So something like:

data_dict = defaultdict(list)
for line in sys.stdin:
       tokens = line.split("\t")

for k,v in data_dict.items():
    print k,sum(v)

Also, in the reducer code mentioned in the link.. Why are the follwoing
lines needed:
# do not forget to output the last word if needed! if current_word == word:
print '%s\t%s' % (current_word, current_count)

THough the code is well commented.. :( My apologies for asking naive

View raw message