cool. Want to be walked through the "man page" of this. Forward to me? Or be prepared to demo in the review? On Aug 1, 2006, at 5:34 PM, Hairong Kuang (JIRA) wrote: > [ http://issues.apache.org/jira/browse/HADOOP-412?page=all ] > > Hairong Kuang updated HADOOP-412: > --------------------------------- > > Attachment: filter.patch > > This patch provides class SequenceFileInputFilter that can feed a > subset of sequence file records to map tasks. It provides a class > method setFilter that defines a flltering criteria. > > The patch provides three Filters: RegexFilter, PercentFilter, and > MD5Filter. But a programmer may define its own filter. Any user- > defined filter should either implements interface Filter or extend > from FilterBase. > > A junit test is also included. > >> provide an input format that fetches a subset of sequence file >> records >> --------------------------------------------------------------------- >> - >> >> Key: HADOOP-412 >> URL: http://issues.apache.org/jira/browse/HADOOP-412 >> Project: Hadoop >> Issue Type: New Feature >> Components: mapred >> Affects Versions: 0.4.0 >> Reporter: Hairong Kuang >> Assigned To: Hairong Kuang >> Fix For: 0.4.0 >> >> Attachments: filter.patch >> >> >> Sometimes a map/red job only wants to work on a subset of input >> data for the needs of its apllication or at the debugging phase. >> It would be convenient if an input format transparently handles >> this. It should provide an API that allows a programmer to specify >> a filtering criteria. > > -- > This message is automatically generated by JIRA. > - > If you think it was sent incorrectly contact one of the > administrators: http://issues.apache.org/jira/secure/ > Administrators.jspa > - > For more information on JIRA, see: http://www.atlassian.com/ > software/jira > >