Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9108C63C6 for ; Tue, 28 Jun 2011 13:19:35 +0000 (UTC) Received: (qmail 72080 invoked by uid 500); 28 Jun 2011 13:19:34 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 71828 invoked by uid 500); 28 Jun 2011 13:19:33 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 71814 invoked by uid 99); 28 Jun 2011 13:19:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 13:19:33 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jeremy.cunningham.hul4@statefarm.com designates 205.242.229.165 as permitted sender) Received: from [205.242.229.165] (HELO bp75avo1.opr.statefarm.org) (205.242.229.165) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 13:19:25 +0000 Received: from bp75avo1.opr.statefarm.org (localhost.localdomain [127.0.0.1]) by postfix.imss71 (Postfix) with ESMTP id AE81122A6AD for ; Tue, 28 Jun 2011 06:19:04 -0700 (MST) Received: from WPSDGQGK.OPR.STATEFARM.ORG (WPSDGQGK.opr.statefarm.org [10.128.182.22]) by bp75avo1.opr.statefarm.org (Postfix) with ESMTP id A116A22A6AC for ; Tue, 28 Jun 2011 06:19:04 -0700 (MST) Received: from WPSDGQGP.OPR.STATEFARM.ORG ([169.254.10.224]) by WPSDGQGK.OPR.STATEFARM.ORG ([169.254.6.20]) with mapi id 14.01.0255.000; Tue, 28 Jun 2011 06:19:04 -0700 From: Jeremy Cunningham To: "mapreduce-user@hadoop.apache.org" Subject: Emit an entire file Thread-Topic: Emit an entire file Thread-Index: Acw1lfO4YkyMoqQLSNKHEdGfquJPsA== Date: Tue, 28 Jun 2011 13:19:04 +0000 Message-ID: <444DD8806DC2604C8A391890ABDAFA510F7193@WPSDGQGP.OPR.STATEFARM.ORG> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.128.226.74] Content-Type: multipart/alternative; boundary="_000_444DD8806DC2604C8A391890ABDAFA510F7193WPSDGQGPOPRSTATEF_" MIME-Version: 1.0 X-TM-AS-Product-Ver: IMSS-7.1.0.1302-6.5.0.1024-18226.007 X-TM-AS-User-Approved-Sender: Yes X-TMASE-MatchedRID: OLAYl0kO9l/QVB8fTp2PEtRZ1mKTfKthVYV8j2OAQmXwSAS+0dSeGf+4 ZHmr4Mk/njCXzX2ePLHhhLoXIby/KKOUVKBdY9a4Z6unGlnCDgvBbWsz5X0e0mabkefLUipm/B+ ZbDJ0sUm5fGDZw7uVYK7sfda0feEEmxh0gY/o+Vkvj6wHfIGxyfc5qL6Z50VsqvUTB4/XntC+vh fM2tGzNvdoK1MEIGGXEL0R9pW5EeuUUPnjRonZNySxIFlMYKvCGgD9/L/USZUJPSE4dee/G2QBP nifDIIokn2Fq3+B4iOttc65fEcaU5nmsOnr6oCwITFpPlJ3P/sH76OT0vuZXCAeyK9SeugR+Cu4 rpYHl6j4ZUuovKYN/A== X-Virus-Checked: Checked by ClamAV on apache.org --_000_444DD8806DC2604C8A391890ABDAFA510F7193WPSDGQGPOPRSTATEF_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I have lots of binary files stored in hdfs. I read them using Apache POI a= nd can search with no problems. I want to be able to search for keywords (w= hich I can do) and then copy the file that has the text out to a different = location. The location can be in hdfs but I just need a location that cont= ains all the files that meet my criteria. Thanks, Jeremy --_000_444DD8806DC2604C8A391890ABDAFA510F7193WPSDGQGPOPRSTATEF_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
I have lots of binary files stored in hdfs.  I read them using Ap= ache POI and can search with no problems. I want to be able to search for k= eywords (which I can do) and then copy the file that has the text out to a = different location.  The location can be in hdfs but I just need a location that contains all the files that meet= my criteria.
 
Thanks,
Jeremy
 
--_000_444DD8806DC2604C8A391890ABDAFA510F7193WPSDGQGPOPRSTATEF_--