hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2055) JobConf should have a setInputPathFilter(PathFilter filter) method
Date Tue, 25 Mar 2008 10:35:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581861#action_12581861
] 

Alejandro Abdelnur commented on HADOOP-2055:
--------------------------------------------

I've figured out (IMO) a cleaner way of implementing this feature:

Adding the following 2 instance methods to the JobConf:

 * void setInputPathFilter(class<? extends PathFilter> pathFilter);
 * InputPathFilter getInputPathFilter();

Modifying the FileInputFormat's listPaths() method to apply the hiddenFileFilter and (if set)
the filter set in the jobconf. 

And still globbing works for regex inclusion, even if a path filter is set.

By being able to specify a custom PathFilter it will be possible to create more complex filters
such as exclusion ones and doing selections not possible to be done via regex.

> JobConf should have a setInputPathFilter(PathFilter filter) method
> ------------------------------------------------------------------
>
>                 Key: HADOOP-2055
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2055
>             Project: Hadoop Core
>          Issue Type: New Feature
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Minor
>
> It should be possible to set a PathFilter for the input to avoid taking certain files
as input data within the input directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message