hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Xu <wenhao...@gmail.com>
Subject Is there any way I could keep both the Mapper and Reducer output in hdfs?
Date Tue, 03 May 2011 06:09:16 GMT
Dear all,

We have a task to run a map-reduce job multiple times to do some machine
learning calculation. We will first use a mapper to update the data
iteratively, and then use the reducer to process the output of the mapper to
update a global matrix. After that, we need to re-use the output of the
previous mapper(as a datasource) and reducer(as a set of parameters) to
re-run the map-reduce again to do another round of learning.

I am wondering is there any setting or API I could use to let the hadoop to
keep both the output of the mapper and reducer? Now it looks if it is a job
contains a reducer, it will delete the intermediate result generated by the
mapper.

Thanks.
Stanley Xu

Mime
View raw message