lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [Jakarta Lucene Wiki] Updated: LuceneFAQ
Date Thu, 30 Dec 2004 21:19:03 GMT
   Date: 2004-12-30T13:19:03
   Editor: DanielNaber
   Wiki: Jakarta Lucene Wiki
   Page: LuceneFAQ

   no comment

Change Log:

@@ -445,6 +445,17 @@
 See article [ Parsing, indexing, and
searching XML with Digester and Lucene].
+==== How can I index files? ====
+These files (.sxw, .sxc, etc) are ZIP archives that contain XML files. Uncompress
+the file using Java's ZIP support, then parse meta.xml to get title etc.
+and content.xml to get the document's content. Add these to the Lucene index,
+typically using one Lucene field per property.
+Note that this applies to 1.x, things might change a bit for
+2.x, but the basic approach will still be the same.
 ==== How can I index MS-Word documents? ====
 In order to index Word documents you need to first parse them to extract text that you want
to index from them.  Here are some Word parsers that can help you with that:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message