hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaskar Ghosh <bjgin...@yahoo.co.in>
Subject How to read whole files and output processed texts to another file through MapReduce
Date Wed, 17 Nov 2010 14:22:17 GMT

 Dear All,

I am having a requirement in which I need to move my existing program to 
map-reduce framework:

---I am reading files within a directory and also subdirectories.
---Processing one file at a time
---Writing all the processed output to a single output file. [One output file 
per folder]

Now, if I have to do this process using Map-Reduce, how should I progress?
I think I need to give one file to one Mapper at a time, when all the mappers 
combine, one single reducer should write to a single file. [as I think we cannot 
write parallely to a single output file]

Please suggest me (or point me to resources) so that I can:
a) My map function gets one file at a time (instead of one line at a time)
b) Should implementing a custom RecordReader and/or FileInputFormat allow me to 
read files in subdirectories as well (one file at a time) ?

Appreciate any help.
Bhaskar Ghosh
Hyderabad, India


"Ignorance is Bliss... Knowledge never brings Peace!!!"

View raw message