hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saptarshi Guha <saptarshi.g...@gmail.com>
Subject The name of the current input file during a map
Date Thu, 26 Nov 2009 07:05:12 GMT
Hello,
I have a set of input files part-r-* which I will pass through another
map(no reduce).  the part-r-* files consist of key, values, keys being
small, values fairly large(MB's)

I would like to index these, i.e run a map, whose output is key and
/filename/ i.e to which part-r-* file the particular key belongs, so
that if i need them again I can just access that file.

Q: In the map stage,how do I retrieve the name of the file being
processed?  I'd rather not use the MapFileOutputFormat.

Hadoop 0.21

Regards
Saptarshi

Mime
View raw message