lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: MS-Word docs.
Date Mon, 28 Nov 2005 01:28:09 GMT

: I dump the doc files into a text file with the same variable I use in
: the Lucene doc.add(Field.UnStored("content", textStr));| and they look
: fine in the file. However searches return nothing.

if i'm reading that sentence correctly, then you are saying that you've
tried isolating your MS-Word text extraction from your Lucene indexing and
confirmed that the MS-Word text extraction is working.

This is a very good first step.

Now if you want to be certain that your indexing is working correclty, i
suggest you try using a tool like Luke to check exactly what terms re
being stored in your index.  I'm guessing that the problems you are having
are a result of "analysis paralysis" as i've seen it called many times...

This wiki covers a lot of things you should check...

http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message