lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Crump, Michael" <mcr...@leadscope.com>
Subject RE: Zip Files
Date Tue, 01 Mar 2005 18:09:51 GMT
Not sure what you are using as your indexing classes but if you changed them to use InputStream
I think it would go a long way towards making them more flexible and solving your problem.

> -----Original Message-----
> From: Luke Shannon [mailto:lshannon@futurebrand.com]
> Sent: Tuesday, March 01, 2005 12:39 PM
> To: Lucene Users List
> Subject: Re: Zip Files
> 
> Thanks Ernesto.
> 
> The issue I'm working with now (this is more lack of experience than
> anything) is getting an input I can index. All my indexing classes (doc,
> pdf, xml, ppt) take a File object as a parameter and return a Lucene
> Document containing all the fields I need.
> 
> I'm struggling with how I can work with an  array of bytes  instead of a
> Java File.
> 
> It would be easier to unzip the zip to a temp directory, parse the files
> and
> than delete the directory. But this would greatly slow indexing and use up
> disk space.
> 
> Luke
> 
> ----- Original Message -----
> From: "Ernesto De Santis" <ernesto.desantis@colaborativa.net>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Tuesday, March 01, 2005 10:48 AM
> Subject: Re: Zip Files
> 
> 
> > Hello
> >
> > first, you need a parser for each file type: pdf, txt, word, etc.
> > and use a java api to iterate zip content, see:
> >
> >
> http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html
> >
> > use getNextEntry() method
> >
> > little example:
> >
> > ZipInputStream zis = new ZipInputStream(fileInputStream);
> > ZipEntry zipEntry;
> > while(zipEntry = zis.getNextEntry() != null){
> >     //use zipEntry to get name, etc.
> >     //get properly parser for current entry
> >     //use parser with zis (ZipInputStream)
> > }
> >
> > good luck
> > Ernesto
> >
> > Luke Shannon escribi├│:
> >
> > >Hello;
> > >
> > >Anyone have an ideas on how to index the contents within zip files?
> > >
> > >Thanks,
> > >
> > >Luke
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> > >
> > >
> > >
> >
> > --
> > Ernesto De Santis - Colaborativa.net
> > C├│rdoba 1147 Piso 6 Oficinas 3 y 4
> > (S2000AWO) Rosario, SF, Argentina.
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message