hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Hančar <pavel.han...@gmail.com>
Subject more reduce tasks
Date Thu, 03 Jan 2013 21:11:24 GMT
  Hello,
I'd like to use more than one reduce task with Hadoop Streaming and I'd
like to have only one result. Is it possible? Or should I run one more job
to merge the result? And is it the same with non-streaming jobs? Below you
see, I have 5 results for mapred.reduce.tasks=5.

$ hadoop jar
/packages/run.64/hadoop-0.20.2-cdh3u1/contrib/streaming/hadoop-streaming-0.20.2-cdh3u1.jar
-D mapred.reduce.tasks=5 -mapper /bin/cat -reducer /tmp/wcc -file /tmp/wcc
-file /bin/cat -input /user/hadoopnlp/1gb -output 1gb.wc
.
.
.
13/01/03 22:00:03 INFO streaming.StreamJob:  map 100%  reduce 100%
13/01/03 22:00:07 INFO streaming.StreamJob: Job complete:
job_201301021717_0038
13/01/03 22:00:07 INFO streaming.StreamJob: Output: 1gb.wc
$ hadoop dfs -cat 1gb.wc/part-*
472173052
165736187
201719914
184376668
163872819
$

where /tmp/wcc contains
#!/bin/bash
wc -c

Thanks for any answer,
 Pavel Hančar

Mime
View raw message