lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Shannon" <>
Subject Re: Zip Files
Date Mon, 07 Mar 2005 15:47:10 GMT

I posted a question about ZIPS a while back. I thought I would post the
solution I arrived at.

Below is how I ended up handling ZIP files. If a situation occurs where
there are ZIPS inside of ZIPS I ignore the embedded ZIP. If requests come in
to handle this situation, I think I will have to unzip to a temp folder,
index and than delete the temp folder.

In our system all documents contained in a ZIP need to be associated with an
single document in our CMS. This is why all the field obtained from the
archive files are added to the same collection (and eventually written to
the same document).


//zip files
        else if (attached.getPath().endsWith(".zip")) {
            Document attachedDoc = new Document();
            Trace.DEBUG("Got a zip file to index: " + attached.getPath());
            try {
                ZipFile zip = new ZipFile(attached);
                ZipEntry zipEntry;
                Enumeration files = zip.entries();
                Vector totalEnummaration = new Vector();
                while (files.hasMoreElements()) {
                    zipEntry = (ZipEntry) files.nextElement();
                    Trace.DEBUG("The zip contains file: " +
                    Enumeration data =
indexAttached(zip.getInputStream(zipEntry), attached
                    .getPath(), zipEntry.getName());
                     * put the return fields in a vector
                    insertFields(totalEnummaration, data);
                    data = null;
                return totalEnummaration.elements();
            } catch (Exception e) {
                Trace.ERROR("INDEXING ERROR: Was unable to index Zip file: "
+ attached.getPath()
                + " " + e);
                return null;

----- Original Message ----- 
From: "Luke Shannon" <>
To: "Lucene Users List" <>
Sent: Tuesday, March 01, 2005 10:34 AM
Subject: Zip Files

> Hello;
> Anyone have an ideas on how to index the contents within zip files?
> Thanks,
> Luke
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message