Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 38403 invoked from network); 2 Aug 2006 00:36:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 Aug 2006 00:36:48 -0000 Received: (qmail 49501 invoked by uid 500); 2 Aug 2006 00:36:48 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 49342 invoked by uid 500); 2 Aug 2006 00:36:47 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 49333 invoked by uid 99); 2 Aug 2006 00:36:47 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Aug 2006 17:36:47 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Aug 2006 17:36:46 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 04DE241001E for ; Wed, 2 Aug 2006 00:34:15 +0000 (GMT) Message-ID: <12967039.1154478855017.JavaMail.jira@brutus> Date: Tue, 1 Aug 2006 17:34:15 -0700 (PDT) From: "Hairong Kuang (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-412) provide an input format that fetches a subset of sequence file records In-Reply-To: <1340190.1154477293913.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-412?page=all ] Hairong Kuang updated HADOOP-412: --------------------------------- Attachment: filter.patch This patch provides class SequenceFileInputFilter that can feed a subset of sequence file records to map tasks. It provides a class method setFilter that defines a flltering criteria. The patch provides three Filters: RegexFilter, PercentFilter, and MD5Filter. But a programmer may define its own filter. Any user-defined filter should either implements interface Filter or extend from FilterBase. A junit test is also included. > provide an input format that fetches a subset of sequence file records > ---------------------------------------------------------------------- > > Key: HADOOP-412 > URL: http://issues.apache.org/jira/browse/HADOOP-412 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Affects Versions: 0.4.0 > Reporter: Hairong Kuang > Assigned To: Hairong Kuang > Fix For: 0.4.0 > > Attachments: filter.patch > > > Sometimes a map/red job only wants to work on a subset of input data for the needs of its apllication or at the debugging phase. It would be convenient if an input format transparently handles this. It should provide an API that allows a programmer to specify a filtering criteria. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira