lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad KVSH" <>
Subject Help on DOCX and XLSX
Date Wed, 07 Mar 2012 09:40:26 GMT
Dear All,


We started using Lucene version 3.0.3, we have different types of
documents like PDF, XLS, XLSX, DOC, DOCX,TXT etc., at a specified


We have created index on these files(using, Indexing
has took 17.2 MB for 69.4MB Documents. This index created using Standard
Analyzer with limited index fields. And able to search a given text in
PDF(text content only), *.doc and *.xls(MS Word 1997-2003) versions


Now I need help on .docx and .xlsx files indexing. How I can run
indexing on these files. These files are ignored when we do a string


Writer is defined as below:

IndexWriter writer = new IndexWriter(, new
StandardAnalyzer(Version.LUCENE_CURRENT), true,


Another question is on the size of index folder, whether we can optimize
the size




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message