hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amar Kamat <ama...@yahoo-inc.com>
Subject Re: [HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder
Date Fri, 11 Apr 2008 17:11:22 GMT
One way to do this is to write your own (file) input format. See 
src/java/org/apache/hadoop/mapred/FileInputFormat.java. You need to 
override listPaths() in order to have selectivity amongst the files in 
the input folder.
Alfonso Olias Sanz wrote:
> Hi
> I have a general purpose input folder that it is used as input in a
> Map/Reduce task. That folder contains files grouped by names.
> I want to configure the JobConf in a way I can filter the files that
> have to be processed from that pass (ie  files which name starts by
> Elementary, or Source etc)  So the task function will only process
> those files.  So if the folder contains 1000 files and only 50 start
> by Elementary. Only those 50 will be processed by my task.
> I could set up different input folders and those containing the
> different files, but I cannot do that.
> Any idea?
> thanks

View raw message