hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-412) provide an input format that fetches a subset of sequence file records
Date Wed, 02 Aug 2006 00:34:15 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-412?page=all ]

Hairong Kuang updated HADOOP-412:

    Attachment: filter.patch

This patch provides class SequenceFileInputFilter that can feed a subset of sequence file
records to map tasks.  It provides a class method setFilter that defines a flltering criteria.

The patch provides three Filters: RegexFilter, PercentFilter, and MD5Filter. But a programmer
may define its own filter. Any user-defined filter should either implements interface Filter
or extend from FilterBase.

A junit test is also included.

> provide an input format that fetches a subset of sequence file records
> ----------------------------------------------------------------------
>                 Key: HADOOP-412
>                 URL: http://issues.apache.org/jira/browse/HADOOP-412
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.4.0
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.4.0
>         Attachments: filter.patch
> Sometimes a map/red job only wants to work on a subset of input data for the needs of
its apllication or at the debugging phase. It would be convenient if an input format transparently
handles this. It should provide an API that allows a programmer to specify a filtering criteria.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message