hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <ar...@yahoo-inc.com>
Subject Re: [HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder
Date Fri, 11 Apr 2008 17:37:22 GMT

On Apr 11, 2008, at 10:21 AM, Amar Kamat wrote:

> A simpler way is to use FileInputFormat.setInputPathFilter(JobConf,  
> PathFilter). Look at org.apache.hadoop.fs.PathFilter for details on  
> PathFilter interface.

+1, although FileInputFormat.setInputPathFilter is available only in  
hadoop-0.17 and above... like Amar mentioned previously, you'd have  
to have a custom InputFormat prior to hadoop-0.17.


> Amar
> Alfonso Olias Sanz wrote:
>> Hi
>> I have a general purpose input folder that it is used as input in a
>> Map/Reduce task. That folder contains files grouped by names.
>> I want to configure the JobConf in a way I can filter the files that
>> have to be processed from that pass (ie  files which name starts by
>> Elementary, or Source etc)  So the task function will only process
>> those files.  So if the folder contains 1000 files and only 50 start
>> by Elementary. Only those 50 will be processed by my task.
>> I could set up different input folders and those containing the
>> different files, but I cannot do that.
>> Any idea?
>> thanks

View raw message