lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad KVSH" <Prasad.Kokep...@ness.com>
Subject Help on DOCX and XLSX
Date Wed, 07 Mar 2012 09:40:26 GMT
Dear All,

 

We started using Lucene version 3.0.3, we have different types of
documents like PDF, XLS, XLSX, DOC, DOCX,TXT etc., at a specified
folder. 

 

We have created index on these files(using IndexFiles.java), Indexing
has took 17.2 MB for 69.4MB Documents. This index created using Standard
Analyzer with limited index fields. And able to search a given text in
PDF(text content only), *.doc and *.xls(MS Word 1997-2003) versions
only.

 

Now I need help on .docx and .xlsx files indexing. How I can run
indexing on these files. These files are ignored when we do a string
search

 

Writer is defined as below:

IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR), new
StandardAnalyzer(Version.LUCENE_CURRENT), true,
IndexWriter.MaxFieldLength.LIMITED);

 

Another question is on the size of index folder, whether we can optimize
the size

 

Thanks

Prasad


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message