Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 45639 invoked from network); 30 Sep 2007 22:34:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Sep 2007 22:34:23 -0000 Received: (qmail 83674 invoked by uid 500); 30 Sep 2007 22:34:11 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 83646 invoked by uid 500); 30 Sep 2007 22:34:11 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 83637 invoked by uid 99); 30 Sep 2007 22:34:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Sep 2007 15:34:11 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stuhood@webmail.us designates 207.97.245.161 as permitted sender) Received: from [207.97.245.161] (HELO smtp161.iad.emailsrvr.com) (207.97.245.161) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Sep 2007 22:34:10 +0000 Received: from webmail.us (webmail4.r2.iad.emailsrvr.com [192.168.1.12]) by relay6.relay.iad.emailsrvr.com (SMTP Server) with ESMTP id 1E96D6608F2 for ; Sun, 30 Sep 2007 18:33:50 -0400 (EDT) Received: by beta.webmail.us (Authenticated sender: stuhood@webmail.us, from: stuhood@webmail.us) with HTTP; Sun, 30 Sep 2007 18:33:50 -0400 (EDT) Date: Sun, 30 Sep 2007 18:33:50 -0400 (EDT) Subject: InputFormat for Two Types From: "Stu Hood" To: hadoop-user@lucene.apache.org Reply-To: stuhood@webmail.us MIME-Version: 1.0 Content-Type: multipart/alternative;boundary="----=_20070930183350_46070" Importance: Normal X-Priority: 3 (Normal) X-Type: 2 Message-ID: <39479.192.168.1.70.1191191630.webmail@192.168.1.70> X-Mailer: webmail6.5b X-Virus-Checked: Checked by ClamAV on apache.org ------=_20070930183350_46070 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello,=0A=0AI need to write a mapreduce program that begins with 2 jobs:=0A= 1. Convert raw log data to SequenceFiles=0A 2. Read from SequenceFiles, an= d cherry pick completed events=0A (otherwise, keep them as SequenceFiles t= o be checked again later)=0ABut I should be able to compact those 2 jobs in= to 1 job.=0A=0AI just need to figure out how to write an InputFormat that u= ses 2 types of RecordReaders, depending on the input file type. Specificall= y, the inputs would be either raw log data (TextInputFormat), or partially = processed log data (SequenceFileInputFormat).=0A=0AI think I need to extend= SequenceFileInputFormat to look for an identifying extension on the files.= Then I would be able to return either a LineRecordReader or a SequenceFile= RecordReader, and some logic in Map could process the line into a record.= =0A=0AAm I headed in the right direction? Or should I stick with running 2 = jobs instead of trying to squash these steps into 1?=0A=0AThanks,=0A=0AStu = Hood=0A=0AWebmail.us=0A=0A"You manage your business. We'll manage your emai= l."=C2=AE ------=_20070930183350_46070--