hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2055) JobConf should have a setInputPathFilter(PathFilter filter) method
Date Tue, 25 Mar 2008 10:35:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581861#action_12581861

Alejandro Abdelnur commented on HADOOP-2055:

I've figured out (IMO) a cleaner way of implementing this feature:

Adding the following 2 instance methods to the JobConf:

 * void setInputPathFilter(class<? extends PathFilter> pathFilter);
 * InputPathFilter getInputPathFilter();

Modifying the FileInputFormat's listPaths() method to apply the hiddenFileFilter and (if set)
the filter set in the jobconf. 

And still globbing works for regex inclusion, even if a path filter is set.

By being able to specify a custom PathFilter it will be possible to create more complex filters
such as exclusion ones and doing selections not possible to be done via regex.

> JobConf should have a setInputPathFilter(PathFilter filter) method
> ------------------------------------------------------------------
>                 Key: HADOOP-2055
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2055
>             Project: Hadoop Core
>          Issue Type: New Feature
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Minor
> It should be possible to set a PathFilter for the input to avoid taking certain files
as input data within the input directories.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message