Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 43362 invoked from network); 1 Mar 2005 19:47:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 1 Mar 2005 19:47:23 -0000 Received: (qmail 80337 invoked by uid 500); 1 Mar 2005 19:47:13 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 80285 invoked by uid 500); 1 Mar 2005 19:47:12 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 80226 invoked by uid 99); 1 Mar 2005 19:47:12 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of clamprecht@gmail.com designates 64.233.170.202 as permitted sender) Received: from rproxy.gmail.com (HELO rproxy.gmail.com) (64.233.170.202) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 01 Mar 2005 11:47:11 -0800 Received: by rproxy.gmail.com with SMTP id j1so1324727rnf for ; Tue, 01 Mar 2005 11:47:09 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=bfrJUb9TML8LWOwgZT3r3QJync1BVmjZMwragNDegY3rz/oF82nKXxjhZ2A2A0KwKro6kfKImzRd16vghMGq6iHouBD/FtUE/4BpWe2EjDv/w+nMjAoUuxZKUArihtKNBj8a7NFI6NdZtokammBIImzj68v1uVq91NuCrXnt/kE= Received: by 10.38.15.37 with SMTP id 37mr143999rno; Tue, 01 Mar 2005 11:46:37 -0800 (PST) Received: by 10.38.104.15 with HTTP; Tue, 1 Mar 2005 11:44:13 -0800 (PST) Message-ID: <88c6a672050301114471550869@mail.gmail.com> Date: Tue, 1 Mar 2005 13:44:13 -0600 From: Chris Lamprecht Reply-To: Chris Lamprecht To: Lucene Users List Subject: Re: Zip Files In-Reply-To: <00bb01c51e85$979400f0$7703d00a@hypermedia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable References: <003101c51d6c$41382840$d8040c0a@eu.uis.unisys.com> <16930.54852.293915.792642@tanto-xipolis.de> <006401c51d70$0bcf6110$d8040c0a@eu.uis.unisys.com> <97b971f57318214fc42057858379ba45@ehatchersolutions.com> <00aa01c51d8e$f07fdc40$d8040c0a@eu.uis.unisys.com> <42238D7C.30306@apache.org> <008501c51e74$21865ae0$7703d00a@hypermedia.com> <42248EE2.4080907@colaborativa.net> <00bb01c51e85$979400f0$7703d00a@hypermedia.com> X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Luke, Look at the javadocs for java.io.ByteArrayInputStream - it wraps a byte array and makes it accessible as an InputStream. Also see java.util.zip.ZipFile. You should be able to read and parse all contents of the zip file in memory. http://java.sun.com/j2se/1.4.2/docs/api/java/io/ByteArrayInputStream.html On Tue, 1 Mar 2005 12:39:17 -0500, Luke Shannon wrote: > Thanks Ernesto. >=20 > I'm struggling with how I can work with an array of bytes instead of a > Java File. >=20 > It would be easier to unzip the zip to a temp directory, parse the files = and > than delete the directory. But this would greatly slow indexing and use u= p > disk space. >=20 > Luke >=20 > ----- Original Message ----- > From: "Ernesto De Santis" > To: "Lucene Users List" > Sent: Tuesday, March 01, 2005 10:48 AM > Subject: Re: Zip Files >=20 > > Hello > > > > first, you need a parser for each file type: pdf, txt, word, etc. > > and use a java api to iterate zip content, see: > > > > http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.ht= ml > > > > use getNextEntry() method > > > > little example: > > > > ZipInputStream zis =3D new ZipInputStream(fileInputStream); > > ZipEntry zipEntry; > > while(zipEntry =3D zis.getNextEntry() !=3D null){ > > //use zipEntry to get name, etc. > > //get properly parser for current entry > > //use parser with zis (ZipInputStream) > > } > > > > good luck > > Ernesto > > > > Luke Shannon escribi=F3: > > > > >Hello; > > > > > >Anyone have an ideas on how to index the contents within zip files? > > > > > >Thanks, > > > > > >Luke > > > > > > > > >--------------------------------------------------------------------- > > >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > > > > > > > > > > > > -- > > Ernesto De Santis - Colaborativa.net > > C=F3rdoba 1147 Piso 6 Oficinas 3 y 4 > > (S2000AWO) Rosario, SF, Argentina. > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org >=20 > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org