lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Shannon" <lshan...@futurebrand.com>
Subject Re: Zip Files
Date Tue, 01 Mar 2005 17:39:17 GMT
Thanks Ernesto.

The issue I'm working with now (this is more lack of experience than
anything) is getting an input I can index. All my indexing classes (doc,
pdf, xml, ppt) take a File object as a parameter and return a Lucene
Document containing all the fields I need.

I'm struggling with how I can work with an  array of bytes  instead of a
Java File.

It would be easier to unzip the zip to a temp directory, parse the files and
than delete the directory. But this would greatly slow indexing and use up
disk space.

Luke

----- Original Message ----- 
From: "Ernesto De Santis" <ernesto.desantis@colaborativa.net>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Tuesday, March 01, 2005 10:48 AM
Subject: Re: Zip Files


> Hello
>
> first, you need a parser for each file type: pdf, txt, word, etc.
> and use a java api to iterate zip content, see:
>
> http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html
>
> use getNextEntry() method
>
> little example:
>
> ZipInputStream zis = new ZipInputStream(fileInputStream);
> ZipEntry zipEntry;
> while(zipEntry = zis.getNextEntry() != null){
>     //use zipEntry to get name, etc.
>     //get properly parser for current entry
>     //use parser with zis (ZipInputStream)
> }
>
> good luck
> Ernesto
>
> Luke Shannon escribi├│:
>
> >Hello;
> >
> >Anyone have an ideas on how to index the contents within zip files?
> >
> >Thanks,
> >
> >Luke
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> >
>
> -- 
> Ernesto De Santis - Colaborativa.net
> C├│rdoba 1147 Piso 6 Oficinas 3 y 4
> (S2000AWO) Rosario, SF, Argentina.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message